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INSECTVIRUSES AND THEIR USES IN PROTECTING PLANTS 



FIELD OF THE INVENTION 

The present invention relates to insect viruses useful in control of insect attack 
5 on plants. It particularly relates to biological insecticides, especially those 
comprised of insect viruses. In particular {^plications, the invention also 
provides recombinant viruses and transgenic plants. 

BACKGROUND OF THE INVENTION 

10 There is increasing awareness of the desirability of insect pest control by 

biological agents. Considerable effort in recent years has been devoted to the 
identification and e^loitation of DNA viruses with large genomes, espedaUy 
the baailoviruses. These viruses generally require extensive genetic 
manipiilation to become effective insecticides, and the first such modified 

15 viruses are only now being evaluated 

In contrast, very little effort has been devoted to the study and use of small 
viruses with RNA genomes. 

20 Four main groups of small RNA viruses have been isolated from insects. 

These include members of the picomaviridae, the Nodaviridae, the tetraviridae 
and the unclassified viruses. Descriptions of these groups can be found in the 
Atlas of Invertebrate Viruses (eds J.R. Adams and J. R. Bonami) (CRC Press, 
Boca Raton, 1991) and Viruses of Invertebrates (ed E. Kurstak) (Marcel 

25 Dekker, New York, 1991). These disclosures relating to these viruses concern 
their pathology and biology, not their use in biological control. 



SUMMARY OF THE INVENHON 
30 In a first aspect of the present invention there is provided an isolated small 
RNA virus capable of infecting insect species including Heliothis species. 
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In one particular embodiment, the present invention provides an isolated 
preparation of Hdiothis amdgera stunt virus referred to as "HaSV" herein. 

In a further aspect of the present invention there is provided an isolated 
5 nucleic arid molecule comprising a nucleic arid sequence hybridizable with 
RNA 1 or RNA 2 described herein under low stringency conditions. 

In still a further aspect the invention provides a vector comprising a nudeic 
add molecule, the sequence of vMch is hybridizable with RNA 1 or RNA 2 as 
10 described herein. These vectors indude e3q>ression and transfer vectors for use 
in animak induding insect, plant and bacterial cells. 

In a further aspect the invention provides an isolated protein or polypeptide 
preparation of the proteins or polypeptides derivable from the isolated virus of 
15 the present invention. The invention also extends to antibodies specific for the 
protein and polypeptide preparations. 

In a yet further aspect the invention provides a recombinant insect virus vector 
incorporating all or a part of the isolated virus of the present invention. 

20 

In a still further aspect of the present invention there is provided a method of 
controlling insect attack in a plant comprising genetically manipulating said 
plant so that it is capable of expressing HaSV or mutants, derivatives or 
variants thereof or an insectiridally effective portion of HaSV, mutants, 
25 variants or derivatives thereof and optionally other insectiridally effective 
agents such that insects feeding on the plants are deleteriously effected. 

In another aspect of the present invention there is provided a preparation of 
HaSV or a mutant variant or derivative thereof, or an insectiridally effective 
30 portion of HaSV, mutant, variant or derivative thereof, suitable for application 
to plants, herein the preparation is capable of imparting an insect protective 
effect. 
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BRIEF DESORIPTION OF FIGURES 

Figure 1 is the complete sequence of RNA 1 and of major encoded 
polypeptide, 

5 Figure 2 is the complete sequence of RNA 2 in the authentic version, and its 
encoded polypeptides (the RNA 2 variant called the "5C version" is also shown 
around nucleotide position 570 [the amino acid sequence encoded by the 5C 
version is not included but this may be deduced from the nucleotide sequence 
given]). 

10 Figure 3 is bioassay data showing HaSV-induced stunting of larvae. 

Figure 4 is a schematic representation of the proteins encoded by RNA 1 and 
RNA 2. 

Figure 5 is a schematic representation of the proteins e^ressed by RNA 2 in 
bacteria DNA fragments encoding P17, P71, P64, P7 and the fusion protein 
15 P70 were synthesised by PGR. The flanking Ndel and BamHI sites used in 
cloning are indicated (Note that P17 is followed by a Bgin site, i^ose 
cohesive ends are compatible with those of BamHI). 

Rgure 6 illustrates the 3'-terminal secondary structure of HaSV RNAs. The 
tRNA-like structures at the 3' ends of RNAs 1 and 2 are shown. Residues in 

20 bold are common to both sequences. 

Figure 7 Egression strategies for HaSV cDNAs in insect cells. The upper 
part of the figure shows the genoihe organization of RNAs 1 and 2. The lower 
part shows insertion of cDNAs corresponding to these RNAs into a plasmid 
vector, between the heat shock protein (HSP70) promoter of Drosophila and 

25 a suitable polyadenylation (pA) signal. The HSP promoter was obtained by 
PCR \ising suitable primers, with a BaMHI site inserted by PGR inamediately 
upstream of the start of transcription, giving the following sequence: 
GGATCCACAQnnn, v/here the underlined residue is die transcription start 
site for either RNA. The cDNAs are termmed by Clal sites, allowing direct 

30 linkage to ribozyme sequences as described in the text. 

Figure 8 Ribozymes to yield correct 3' ends. The sequences of the ribozymes 
inserted as short cDNA fragments into HaSV cDNA clones are shown. The 
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ribo^me fragments were assembled and doned as described in the text. 
Designed self-deavage points are indicated by bold arrows. 
Figure 9 Immunoblots to map epitopes on HaSV. A. Detected with HaSV 
antiserum. Lane 1: pTP70delSP; lane 2: pTP70; lane 3: pTP17; lane 4: control; 
5 lane 5: pTF70delN; lane 6: pTP70; lane 7: pTP71; lane 8: HaSV virions; 

lane 9: molecular weig^ht markers. B. Detected with HaSV antiserum. Lane 1: 
pTP70delN; lane 2: pTP70delSPN; lane 3: pTP70. C. Detected with an 
antiserum to the Bt toxm (CryIA(c)). lane 1: pTP70; lane 2: HaSV virions; 
lane 3: control extract. 

10 Figure 10 New field isolates of HaSV. Ihe genomic organization of RNA 2 is 
shown at the top of the Figure. PCR using ^jpropriate primers with BamHI 
restriction sites and in some cases altered context sequences of the AUG 
initiating translation of the P17 or P71 genes were used to make fragments for 
cloning into the BamHI sites of the egression vectors. Constructs 17E71 and 

15 P71 have altered context sequences of the AUG initiatmg translation of the 
P17 and P71 genes respectively; these alterations correspond to the context 
derived from the JHE gene (see text). AD context sequences are given on the 
right of the figure. R2 is a done of the complete RNA sequence as a BamHI 
fragment in the vector. 

20 Figure 11 Maps of the egression constructs in baculovirus vectors. 
Figure 12 a to e Various strategies utilising the present invention. 
Figure 13 Expression of RNAs 1 and 2 from baculovirus vectors. The full- 
length cDNA done of HaSV RNA 1 or 2 was inserted as a BamHI fragment 
into the baculoe>q)ression vectors. PCR was used to add BamHI sites 

25 immediately adjacent to the 5' and 3' tennini of the RNA 1 sequence; 

sequences of the primers are given in the text. Constructs RIRZ and R2RZ 
carry cis-acting ribosgmes immediately adjacent to the 3' end of the sequence 
of RNA 1 and 2 respectively. 

Figure 14 Expression strategies for HaSV cDNAs in plant cells. The upper 
30 part of the Figure shows the genome organization of RNAs 1 and 2. The 
lower part shows insertion of cDNAs corresponding to these RNAs into a 
plasmid veaor, between 35S promoter of cauliflower mosaic virus and the 
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polyadenylation (pA) signal on plasmid pDHSl (Pietrzak et al, 1986). The 
cDNAs were obtained by VCR using suitable primers, with a BaMHI site 
inserted by TCR inunediately upstream of the start of eadi cDNA. The 
cDNAs are terminated by Qal sites, allowing direa linkage to ribo^me 
5 sequences as described in the text 

DETAILED DESCRIPnON OF PREFERRED EMBODIMENTS 

A first aspect of the invention contemplates use of small RNA viruses for 

biological control of insects. In particular, in accordance with the first aspect 

10 of this invention there is provided an isolated small RNA virus, particularly H. 
onn^gem stunt virus or mutants, variants or derivatives thereof capable of 
infecting insects, in particular the insect species such as Hdiccfverpa anrugpnu 
The small RNA virus isolate of the instant invention is insecticidal and in 
particular stunts the growth of insect larvae, for example HeUccverpa amdgera 

IS larvae and inhibits or prevents development into the adult stage. 

The small RNA viruses of the instant invention have insectiddal, anti-feeding, 
gut-binding or any synergistic property or other activity useful for insect 
control. 

20 

In particular, Hdicoverpa amti^gem stunt virus (HaSV) particles are isometric 
and approximately 36 nm in diameter with a buoyant density on CsCl gradients 
of 136g/ml. The virus is composed of two major capsid proteins of 
approximately 64 and 7 KDa in size as determined on SDS-PAGE. The HaSV 

25 genome is much later than the largest known nodavirus (another class of RNA 
viruses) and comprises two ss (+) RNA molecules of approximately 5.3 and 
2.4 kb. The genome ^pears to lack a blockage of unknown structure at the 
3' termini that is found in Nodaviridae. The HaSV genome however shares a 
capped structure and non-polyadenylation with Nodaviridae. KaSV differs 

30 significantiy from Nodaviridae and Nudaurelia w virus in terms of its 

immunological properties. In particular the large capsid protein has different 
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antigenic determinants. Other properties of HaSV are described in the 
Examples. 

The host range of HaSV includes Lepidopterans such as from the subfamily 
5 Heliothinae. Species known to be hosts are Hdia^erpa (Hdiothis) amugera, 
H, punctigara, H. zea, Hdiothis virescens and other such noctuides as Spodoptem 
exigua H. armi^ravAdch is known by the common names com ear worm, 
cotton ball worm, tomato grub and tobacco bud worm is a pest of economic 
significance in most countries. H^pimctigen^ the native bud worm, is a pests of 
10 the great economic significance in Australia. Members of the Heliothinae, 
^ch include Hdiccfverpa and Hdiothis, and especially Hjmmgem are among 
the most important and widespread pests in the world. In the US Hdiothis 
virescens and Hdicxverpa zea are particularly important pests. 

15 The first aspect of the in[vention provides an isolated small RNA virus capable 
of infecting insects including Hdiothis spGcics. In a particularly preferred form 
the invention relates to mutants, variants and derivatives of HaSV. The terms 
'•mutant\ "variant and "derivative" include all naturally occurring and artificially 
created viruses or viral components vAndi differ fi^m the HaSV isolate as 

20 herein described in nucleotide content or sequence, amino add content or 
sequence, immxmological reactivity, non-glycosylation or glycos^ation pattern 
and/or infectivity but generally retain insectiddal activity. Specifically the 
terms "mutant", 'Variant" and "derivative" of HaSV covers small RNA viruses 
which have one or more functional characteristic of HaSV described herein. 

25 Examples of mutants, variants or derivatives of HaSV indude small RNA 
viruses that have different nucleic or amino add sequences fixim HaSV but 
retain one of more functional features of HaSV. These may indude strains 
with genetically silent substitutions, strains carrying replication and 
encapsidation sequences and signals that are functionally related to HaSV, or 

30 strains that carry functionally related protein domains. 
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In a preferred aspect the invention relates to mutants, variants or derivatives 
2of HaSV which encode replication or encapsidation sequ nces, structures or 
signals with 60%, preferably 70%, more preferably 80%, still more preferably 
90% and even more preferably 95% nucleotide sequence identity to the 
5 nucleotide sequences HaSV. 

In another preferred aspect the invention relates to mutants, variants or 
derivatives of HaSV vMdi encode proteins with at least 50%, preferably 60%, 
preferably 70%, more preferably 80%, still more preferably 90% and even 
10 more preferably 95% amino acid sequence identity to proteins or polypeptides 
of HaSV. 



In another preferred aspect the invention relates to mutants, variants or 
derivatives of HaSV with 50%, more preferably 60%, still more preferably 
15 70%, more preferably 80%, still more preferably 90 or 95% nucleotide 

sequence identity to the following biologically active domains encoded by the 
HaSV genome: 

RNA 1 - amino arid residues 401 to 600 or the other domains in 
the replicase 
20 RNA 2 (in the capsid protein) 

amino arid residues 273 to 435 

amino arid residues 50 to 272 

amino arid residues 436 to the COOH terminus 



25 Preferably the viral isolate of the present invention is biologically pure vMch 
means a preparation of the virus comprising at least 20% relative to other 
components as determined by weight, viral activity or any other convenient 
means. More preferably the isolates are 50% pure, still more preferably it is 
60%, even more preferably it is 70% pure, still more preferably it is 80% pure 

30 and even more preferably it is 90% or more, pure. 
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In a second aspect the present invention relates to a nucleotide sequence or 
sequences hybridizable with those of HaSV, The term nudeotide sequence 
used herein includes RNA, DNA, cDNA and nucleotide sequences 

5 complementary thereto. Such nucleotide sequences also include single or 
doiible stranded nucleic add molecules and linear and covalently dosed 
circular molecules. The nudeic add sequences may be the same as the HaSV 
sequences as herein described or may contain single or multiple nudeotide 
substitutions and/or deletions and/or additions thereto. The term nudeotide 

10 sequence also indudes sequences with suffident homology to hybridize with 
the nudeotide sequence under low, preferably medium and most preferably 
higji stringen^ conditions (Sambrook J, Fritsch, EP. & Maniatis T. (1989)/ 
Molecular Cloning: A Laboratory Manual, 2nd Edition, Cold Spring Harbour 
Laboratories Press) and to nudeotide sequences encoding functionally 

15 equivalent sequences. In still a more preferred embodiment the invention 
comprises the nudeotide sequences of genome components 1 and 2 as 
represented by Figures 1 and 2 hereinafter or parts thereof, or mutants, 
variants, or derivatives thereof. The terms "mutants", 'Variants" or "derivatives" 
of nucleotide genome components 1 and 2 has the same meaning, ^en 

20 applied to nudeotide sequences as that given above and indudes parts of 
genome components 1 and 2. 

The second aspect of the invention also relates to nudeotide signals, sequences 
or structures vMch enable the nudeic add on ^ich they are present to be 
25 replicated by HaSV replicase. Furthermore the invention relates to the 

nucleotide signals, sequences or structures ^^lich enable nudeic adds on which 
they are present to be encapsidated 

In a particularly preferred embodiment of the second aspect, the invention 
30 comprises nudeotide sequences vMch are mutants of the capsid gene having 
the following sequences: 

ATG GGC GAT GCC GGC GTC GCGT TCA CAG 
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ATG GAG GAT GCT GGA GTG GCG TCA CAG 
ATG AGC GAG GCC GGC GTC GCG TCA CAG 

In a prrferred aspect the invention relates to nucleotide sequences of HaSV 
5 encoding insecdddal activity including the capsid protein gene and P17 and 
mutants, variants and derivatives thereof. 

In another preferred aspect the invention comprises nucleotide sequences 
including the following ribozyme oUgonudeotides: 

10 

5'CCATCGATGCCGGACTGGTATCCCAGGGGG (called "HVRlQa" 
herein) 

5' CCATCGATGCCGGACTGGTATCCCGAGGGAC (called 'S'HVRZCiar 
15 hdrein) 

5' CCATCGATGATCCAGCCTCCTCGCGGCGCCGGATGGGCA (called 
"RZHDVl" herein) 

20 5' GCTCTAGATCCATTCGCCATCCGAAGATGCCCATCCGGC (called 
"RZHDV2" herein) 

5' CCATCGATTTATGCCGAGAAGGTAACCAGAGAAACACAC (called 
"RZHCl" herein) 

25 

5' GCTCTAGACCAGGTAATATACCACAACGTGTGTITCTCT (called 
"RZHC2" herein) 

Ribo^me sequences are useful for obtaining translation, replication and 
30 encapsidation of the transcript. It is therefore desirable to deave the 
transcript downstream of its t-RNA-like structure or poly A tail prior to 
translation, replication or encapsidation of the transcript. 



wo 94/04660 



PCr/AU93/00411 



-10- 

Th present invention also further extends to oligonucleotide primers for the 
above sequences, antisense sequences and nucleotide probes for the above 
sequences and homologues and analogues of said primers, antisense sequences 
and probes. Such primers and probes are useful in the identification, isolation 

5 and/or cloning of genes encoding insectiddally effective proteins or proteins 
required for viral activity, from HaSV or another virus (ii^ether related or 
unrelated) carrying a similar gene or similar RNA sequence. They are also 
useful in screening for HaSV or other viruses in the field or in identifying 
HaSV or other viruses in insects, especially in order to identify related viruses 

10 capable of causing pathogenedty similar to HaSV. 

Any pair of ohgonudeotide primers derived from either RNA 1 or RNA 2 and 
located between ca 300 and 1500 bp apart can be used as primers. The 
following pairs of primer sequences exemplify particularly preferred 
15 embodiments of the present invention: Specifically for RNA 1: 

1. HVR1B5' (described below) and the primer complementary to 
nucleotides 1192-1212 of Figure 1. 

2. The primer corresponding to nudeotides 4084 and 4100 of Fig. 13 and 
the primer HVR13p described below 

20 

Specifically for RNA2: 

1. The primer corresponding to nudeotides 459 to 476 of Fig. 2 and the 
primer complementary to nudeotides 1653 to 1669 of Fig. 2 (this would 
include the central variable domain) 
25 2. R2cdha5 and the primer complementary to nudeotides 1156 to 1172 of 
Fig. 2 

3. The primer corresponding to nudeotides 1178 to 1194 and the primer 
complementary to nucleotides 2072 to 2091 (of Fig. 2). 

Other combinations giving shorter fragments are also possible. 

30 

Further preferred primers indude: 
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5' GGGGGGAATTCATTTAGGTGACACTATAQTTCTGCCTCCCCGGAC 
(called "HvRlSPSp" herein) 

5' GGGGGGATCCTGGTATCCCAGGGGGGC (called "HvR13p" herein) 

5 

5' CCXSGAAGCTTCriTl TCI 1 ICl 1 lACCA (called "Hi2cdna5" herein) 

5' GGGGGATCCGATGGTATCCCGAGGGACGC 
TCAGCAGGTGGCATAGG (called "HvR23p") herein 

10 

AAATAATirroTTACnTTAGAAGGAGATATACATATOAOCGAGCGA 
GCACAC (called -HVPET65N" herein) 

AAATAATTTrGTrTAACCTTAAGAAGGAGATCTACATATGCrGGAGT 
15 nncnrCAC (called "HVPET63N" herein) 

noAnATYn-ArATA-mnnAnATGrTGOAnTG (called "HVPET64N" 
herein) 

20 GTAGCGAACGTCGAGAA (called "HVRNA2F3" herein) 

nnnnnATrrr rAnTTnTrAnTGnrn GflnTAG (called "HVP65C' 
herein) 

25 nannATrr cTAATTGarArnAnCGGCGC (called "HVP6C2" herein) 
AATTACATATGGCGGCCGCCGnTCTGCC (caUed "HVP6MA" herein) 
AATTACATATGirCGCGGCCGCCGnTCT (called "HVP6MF' herein) 

30 

The invention also relates to vectors encoding the nucleotide sequence 
described above and to host cells including the same. Preferably these vectors 
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ar capable of espression in animal, plant or bacterial ceU or are capable of 
transferring the sequences of the present invention to the genome of other 
organisms such as plants. More preferably they are capable of espression in 
insect and crop plant cells. 

5 

In a preferred aspect the invention relates to the vectors pDHVRl, 
pDHVRlRZ, pDHVR2, pDHVR2RZ, pl7V71, pl7E71, pPH, pV71, pl7V64, 
pl7E64, pP64, pV64, pBacHVRl, pBacHVRlRZ, pBacHUR2, pBacHVR2RZ, 
pHSPRl, pHSPRlRZ. PHSPR2, pHSPR2RZ, pSRl(E3)A, pSRl(E3)B, 
10 pSR2A, pSRZB, pSX2P70, pSXR2P70, pSRP2B, pBHVRlB, pBHVR2B, 

pT7T2P64, pSR2P70, pT7T2P65, pT7T2P70, pT7T2-P71, pBSKSE3, pBSRlS, 
pBSR25p, pSR25, phi236P70, phr235P65, pGemP63N, pGemP64N, 
pGemP65N, pP64N, pP65H, pTP6MA, pTP6MF, pTP17, pTP17delBB, pP656 
or p70G as described hereinafter. 

15 

In a tUrd aspect the invention relates to polypeptides or proteins encoded by 
HaSV and to homologues and analogues thereof. This aspect of the invention 
also relates to derivatives and variants of the polypeptides and proteins of 
HaSV. Such derivatives and variants include substitutions and/or deletions of 

20 one or more amino adds, and amino and carboxy tenninal fusions with other 
polypeptides or proteins. In a preferred aspect the invention relates to the 
proteins P7, P16, P17, P64, P70, P71, Plla, Pllb, P14 and P187 described 
herein and to homologues and analogues thereof, including fusion proteins 
particularly of P71 such as P70 described herein. In a most preferred aspect 

25 the invention relates to polypeptides or proteins from HaSV winch have 
insectiddal activity themselves or provide target spedfidty for insectiddal 
agents. In particular the invention relates to polypeptides or fragments thereof 
with insect gut binding spedfidty, particularly to the variable domains thereof 
as herein described In addition, homologues and analogues with said 

30 insectiddal activity of the polypeptides and proteins are also included within 
the scope of the invention. In addition the invention also relates to antibodies 
(such as monodonal or polydonal antibodies or chimeric antibodies induding 
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phage antibodies produced in bacteria) specific for said polypeptide and 
protein sequences. Such antibodies are useful in detecting HaSV and related 
viruses or the protein products thereof. 

S In a fourth aspect the invention provides an infectious, recombinant insect 
virus including a vector, an esqxressible nucleic add sequence comprising all of, 
or a portion of the HaSV genome, including an insectiddally effective portion 
of the genome and optionally, material derived &x>m another insect virus 
spedes or isolate(s). 

10 

Insect virus vectors suitable for the invention according to this aspect, indude 
baculoviruses, entomopoxviruses and cytoplasmic pol^edrosis viruses. Most 
prefo-ably, the insect virus vector is selected from the group comprising the 
baculovirus genera of nudear pol^edrosis viruses (NPVs) and granulosis 

15 viruses (GVs). In this aspect of the invention the vector acts as a carrier for 
the HaSV genes encoding insectidical activity. The recombinant insect virus 
vector may be grown by either established procedures Shieh, (1989), Vlak (in 
press) or any other suitable procedure and the virus disseminated as needed. 
The insect virus vectors may be those described in copending International 

20 application No. PCT/AU92/00413. 

The nudeic add sequence or sequences incorporated into the recombinant 
vector may be a dDNA, DNA or DNA sequence and may comprise the 
genome or portion thereof of a DNA or RNA of HaSV or another spedes. 

25 The term "material derived from another insect virus spedes or isolate" 

indudes any nudeic add sequence, or protein sequence or parts thereof ^^ch 
are useful in exerting an insectiddal effect when incorporated in the 
recombinant vector of the invention. Suitable nudeic add sequences for 
incorporation into the recombinant vector indude ijjsectiddally effective agents 

30 such as a neurotoxin from the mite Pyemotes trEficf (Tomalski, M,D. & Miller, 
L.K. Nature 352, 82-85 (1991) a toadn component of the venom of the North 
African scorpion Andmctonus oitstndiaMzcda, S. et al. \%ology 184-777-780 
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(1991) Stewart, L.MD. et al.. Nature 352, 85-88 (1991), Conotoodns from the 
venom of Corms spp. (Olivera BM. et aL, Science 249, 257-263 (1990); 
Woodward S.R. et al., EMBO J. 9, 1015-1020 (1990); Olivera BJvi et al., Eur, 
J. Biochem. 202, 589-595 (1991). 

5 

The exogenous nudeic add sequence may be operably placed into the insect 
virus vector between a viral or cellular promoter and a polyaden^ation signal. 
Upon infection of an insect cell, the vector virus will cause the production of 
eidier infectious vims genomic RNA or infectious encapsidated viral particles. 

10 

The promoters may be constitutively expressed or inducible. These indude 
tissue specific promoters, temperature sensitive promoters or promoters ^diich 
are activated ^en the insect feeds on a metabolite in the plant that it is 
desired to protect 

15 

Recombinant insect virus vectors according to the present invention may 
indude nudeic arid sequences comprising all or an infectious or insectiddally 
effective portion of genome the HaSV and optionally another insect virus 
spedes or isolate. 

20 

In a particularly preferred embodiment of the present invention there is 
provided assembled capsids comprising one or more of the capsid proteins of 
the present invention, or derivatives or variants thereof as contemplated or 
described herein. These assembled virus capsids are useful as vectors for 

25 insectiddal agents. As such the assembled viral capsids may be used to 
administer insectiddal agents such as various nudeotide sequences with 
insectiddal activity or various toxins to an insect. Nudeotide sequences in the 
form of RNA or DNA vMch can be used include those of the HaSV genome 
or other insect viruses. Toxins which can be used advantageously indude those 

30 ^ch are active intracellxilarly and may also indude neurotoxins with an 
appropriate transportation mechanism to reach the insect neurones. 



SUBSTITUTE SHEET [ 



wo 94/04660 



PCr/AU93/00411 



-15- 

The efficacy or insectiddal activity of infectious genomic RNA or viral 
particles produced by insect cells infected with insect vectors according to this 
aspect of the invention, may be enhanced as described below. Moreover the 
virus vector itself may include within a non- essential region(s), one or more 
5 nucleic add sequences encoding substances that are deleterious to insects such 
as the insectiddally effective agents described above. Alternatively an extra 
genome component may be added to the HaS V genome either by insertion 
into one of the HaSV genes or by adding it to the ends of the genome. 

10 In a particularly preferred embodiment there is provided a recombinant 
baculovirus vector comprising HaSV or part thereof having insectiddal 
properties. 

Other modifications which may be made to the infectioiis recombinant insect 
15 virus according to the fourth aspea indude: 

i) splitting the exogenous HaSV nudeic add molecules comprising tiie 
genome and doning the fragments into the insect vector so that they 
cannot rejoin. One component, preferably the virus RNA replicase, 

20 could be caressed from a separately-transcribed fragment, the 

transcripts of ^ch would not be replicated by the replicase they 
encode. The remainder of the genome (having insectiddal activity or 
encoding the capsid protein or a separate toxin m-RNA) could be 
encoded by, for example, a second separatelytranscribed fragment, the 

25 transcripts of which are capable of being amplified by the replicase. 

Consequently, vdiilst the transcripts from the second or other fragment 
would effect their insectiddal activity upon the infected insect cell, they 
would not be able to infect another insect cell, (even if encapsidated) 
because the replicase or replicase-encoding transcripts would be absent; 

30 

This modification would allow an inherent biological containment to be 
built into the insectiddal vectors, ^Mch, v/h^n used in conjimction with 
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the use of non-persistent DNA virus vectors such as those described in 
the above mentioned copending ^plication, would allow a new level of 
environmental safety greatly extending earlier approaches based on 
baculovirus vectors. 

5 

ii) Manipulation of encapsidation signals or sequences essential for 
replicase binding or production of sub-genomic mRNA's including 
expression of exogeneous insect control factors as RNAs dependent on 
the virus for replication. Hiis involves determination of RNA 

10 sequences and signals important for replication and encapsidation of 

virus RNAs, such as by analysis of replication of deletion mutants 
carrying reporter genes in s^propriate cells, followed by studies on the 
transmission of the reporter gene to larvae by feeding of virus. These 
deletion mutants can be used to carry genes for insect control 

15 factors/toxins to larvae after replacing the reporter gene by a suitable 

toxin gene such as shown in Fig. 12; 

iii) using an insect promoter responsive to virus infection and, for example, 
placing copies of the viral replicase gene under the control of two 

20 promoters, one \^ch is constitutive or expressed at early stages of 

vector infection, and the other being a cellular promoter turned on by 
the ensuing RNA viral infection. This system would then make more 
copies of the replicase mRNA available as the amount of its template 
increased. Such a promoter may be isolated using techniques analogous 

25 to enhancer trapping, that is, transfomung insect ceUs with a suitable 

reporter gene and looking for induction of the reporter upon virus 
infection of a population of transformed cells. 

In a fifth aspect the invention relates to a method of controlling insect attack 
30 in plants by genetically manipulating plants to express HaSV or parts thereof 
^^ch can confer insectiddal activity optionally in combination with other 
insectiddally effective agents. Such plants are referred to as transgenic plants. 
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Hie term "espress" should be understood as referring to the process of 
transcribing the genome or portion thereof into RNA or, alternatively, the 
process of transcribing the genome or portion thereof into RNA and then, in 
turn, translating the RNA into a protein or peptide. 

5 

In a sixth aspect the invention relates to the transgenic plants per se as 
described above. Transgenic plants according to the invention may be 
prepared for example by introducing a DNA construct including a cDNA or 
DNA fragment encoding all or a desired infectious portion of HaS V, into the 

10 genome of a plant The cDNA or DNA fragment may, preferably, be operably 
placed between a plant promoter and a polyadex^lation signal. Promoters may 
cause constitutive or inducible e3q>ression of the sequences under their control. 
Fiuthermore they may be specific to certain tissues, such as the leaves of a 
plant where insect attack occurs but not to other parts of the plant such as that 

15 used for food The inducible promoters may be induced by stimuli such as 
disturbance of wind or insect movement on the plant's tissues, or may be 
specifically turned on by insect damage to plant tissues. Heat may also be a 
stimulus for promoter induction such as in spring :^ere temperatures increase 
and likelihood of insect attack also increases. Other stimuli such as spraying 

20 by a chemical (for instances a harmless chemical) may induce the promoter. 

The cDNA or DNA fragment may encode all or a desired mfectious portion of 
the wild-type, recombinant or otherwise mutated HaSV. For example, deletion 
mutants could be used ^^Mch lack segments of the viral genome '^wdiich are non- 
25 essential for replication or perhaps pathogenicity. 

The nucleotide seqxiences of the invention can be inserted into a plant genome 
by already established techniques, for example by an Agrobacterium transfer 
system or by electroporation. 

30 

Plants vMch may be used in this aspect of the invention include plants of both 
economic and scientific interest. Such plants may be those in general which 



j SUBSTITUTE SHEET | 



wo 94/04660 



PCr/AU93/00411 



-18- 

need protection against the insect pests discussed herein and in particular 
include tomato, potato, com, cotton, field pea and tobacco. 

To enhance the efficacy of infectious genomic RNA or viral particles caressed 
5 by transgenic plants according to the invention, the DNA construct introduced 
into the plants' genome may be engineered to include one or more exogenous 
nudeic add sequences encoding substances that are deleterious to insects. 
Sudi substances indude, for example, BacUlus thurmgiensis 5-taxin, insect 
neurohormones, insectiddal compounds form wasp or scorpion venom or of 
10 heterologous origin, or factors designed to attack and kill infeaed cells in such 
a way so as to cause pathogenesis in the infected tissue (for example, a 
- ribozyme targeted against an essential cellular function). 

DNA constructs may also be provided vMch indude: 

15 

i) mechanisms for regulating pathogen e3q)ression (for example, 
mechanisms whidi restrict the expression of ribozymes to the insect 
cells) by tying for example, expression to abundant virus replication, 
production of minus-strand RNA or sub-genomic mRNA's; and/or 

20 

ii) medianisms similar to, or analogous to, those described in copending 
International patent appMcation number PCT/AU92/00413 so as to 
achieve a lin:uted-spread system (such as control of replication). 

25 Transgenic plants according to the present invention may also be capable of 
e3q>ressing all or an infectious or insectiddal portion of genomes from HaSV 
and one or more spedes or isolates of insect viruses. 

In a seventh aspect of the invention HaSV, or insectiddally effective parts 
30 thereof, or the infectious recombinant virus vectors of the fourth aspect of the 
present invention may be applied direcdy to the plant to control insect attack. 
HaSV or the recombinant virus vectors may be produced either in \i^ole or in 
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part in either whole insects or in culture cells f insects or in bacteria or in 
yeast or in some other egression system. HaSV or the recombinant virus 
forms may be applied in a crude form, semi purified or purified form 
optionally in admixture with agriculturally acceptable carrier to the crop in 
5 need of protection. HaSV may also be applied as a facilitator of infection 
^ere existing insect populations already infected vnih another agent, such as 
one or more other viruses hereby HaSV is able to act synergistically to bring 
about an insectiddal effect Alternatively HaSV and another agent such as 
one or more viruses may be applied together to plants to control insects 
10 feeding thereon. 

A deposit of HaSV No. 18.4 was made on August 5th 1992 at the Australian 
Government Analytical Laboratories. The deposit was given accession No. 
N92/35575. 

15 

EXAMPLEl 

TAXANOMIQ PHYSIOCHEMICAL 
AND BIOCHEMICAL CHARACTERISATION 
OF AN INSECT VIRUS: HaSV 

20 Materials and Methods 

A Animals and vims production. H. Armigara larvae were raised as 

described in Teakle R£. and Jensen J.M. (1985) Hdiothis punctiger 'm 
Singh P and Moore R.F. (eds) Handbook of Insect Rearing Vol 2., 
Elsevier, Amsterdam pp 313-322. Larvae were infected for virus 

25 production by feeding five day old larvae on IQmg pieces of diet to 

^^4iich 0.064 OD25Q units of HaSV had been applied. After 24 hours 
the larvae were then transferred to covered 12-weD plates (BioSdentific, 
Sydney, AustraUa) that contained sufficient diet and grown for eight 
days after which they were collected and frozen at ^80 ^'C until further 

30 processed. Frozen larvae were weighed to lOOg, placed into 200ml of 

50mM Tris buffer (pH 7.4), homogenized, and filt red through four 
layers of muslin. This homogenate was centrifuged in a Sorvall SS-34 
rotor at 10,000 x g for 30 minutes A^ereupon the supernatant was 
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transferred to fresh tubes and recentrifuged in Beckman SW-28 rotor at 
lOOK xg for 3 hours. The resultant band was collected and rcpelleted in 
50 mM pH 72 Tris buffer in a Beckman SW-28 tube by centrifugation 
at lOOK xg for 3 hours. The pelleted virus was resuspended overnight 
5 in Iml of buffer at 4 "C then layered onto a discontmuous Csd gradient 

containmg equal volumes of 60% and 30% CsQ (w/v) m a Beckman 
SW-41 tube and centrifuged at 12 h at 200 xg. The resultant pellet was 
suspended in lOOpl of buffer and frozen for further use. 

10 B Particle characterization. Staining with acridine orange was as 

described in Mayor HX>. and Hill N.O. (1961) Virology 14: p264. 
Buoyant density was estimated in CsCl gradients according to Scotti 
PJ)., Longworth JP., Plus N, Crozier G. and Reignanum C (1981) 
Advances in Virus Research 26; 117-143. 

15 

C bmmmological procedure. Rabbit anti-sera to HaSV was produced by 
standard inmiunological procedures. Rabbit antisera to the Nudaurelia 
o virus in addition to the virus itself was provided by Don Hendry 
(Rhodes University, Grahamstown, South Africa). Rabbit antisera to 

20 the Nudaureha b virus was suppUed by the late Carl Reinganxmi (Plant 

Research Institute, Burnley, Vic, Austraha). The inmiunological 
relationship to the Nudaxurelia <■> virus was determined by the standard 
reciprocal double diffusion technique. Immtmoblotting was performed 
according to Towbin H., Staeheln T. and Gordon J. (1979) PNAS. 

25 Antibodies monospecific for the major 65 kDa capsid protein were 

prepared by incubating polyclonal antisera with sections of 
nitrocellulose blotted with the 65 kDa protein. After extenshre washing 
in Tris buffered saline, the bound antibodies were eluted in 50mM citric 
buffer, pH 8.0 after a 5 minute incubation. 

30 

D Protein characterization. Polyacrylamide gel electrophoresis in the 
presence of SDS followed the procedure of Laemmli UK 1970 Nature 



SUBSTITUTE SHEET 



wo 94/04660 



PCr/AU93/00411 



-21 - 

222; 680-685 and was done with 12-5% gels unless therwise noted with 
low and hig^ molecular weight standards from BioRad. Staining was 
done with a colloidal preparation of Coomassie Blue G-250 (Gradipore 
Ltd, Pynnont, New South Wales, Australia). Determination of the Mj. 

5 of the smallest protein was done with a 16% gel and standards of 3.4 

kDa, 12.5 kDa and 21.5 kDa (Boehringer Mannheun). Glycosylation of 
the viral proteins was determined by a general glycan staining procedure 
with reagents supplied by Boehringer Mannheim; the positive control 
was fetuin. N-termini of proteins were sequenced using procedures 

10 described by Matsudairia (1989) Purification of Proteins and Peptides 

by SDS-PAGE in A Practical Guide to Protein and Peptide Purification 
for Microsequencing ed Matsudaira P.T. Academic Press, San Diego pp 
52-72 on an implied Biosystems 477A gas phase sequencer. 



15 E Nucleic add characterization. RNA was removed firom capsids by twice 
vortexing a virus suspension with equal volumes of neutralized phenol 
then with phenol/chloroform (50:50). RNA was then precipitated from 
the aqueous phase in the presence of 300 mM sodimn acetate and 2.5 
volimies of etiianol. Digestions of the HaSV nucleic acid with RNAse 

20 A and DNAse I (Boehringer Mannheim) were done with pBSSK(-) 

phagemid ssDNA and dsDNA (Stratagene) and RNA controls (BRL). 
Denaturing agarose gel electrophoresis in the presence of formaldehyde 
was performed according to Sambrook et al (1989). The state of 
potyanden^ation of the viral RNA was determined by two methods. 

25 The first method was to compare the binding of identical amoimts (20 

pg) of viral RNA and poly( A>selected RNA from Hdicoverpa virescens 
to a 1ml slurry of 5mg of oligo-d(T) cellulose (Pharmacia) in a binding 
buffer consisting of 20 mM Tris pH 7,8, 500 mM NaCl, 1 mM EDTA 
and 0-04% SDS. The second method was to observe specific priming of 

30 viral RNA and viral RNA polyadenyiated with poly(A) polymerase 

(Pharmacia) with d(T)j5A/C/G primers in RNA sequencing reactions 
using reverse transcriptase (US Biochemical) and a protocol provided 
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by the supplier. The 5' structure of the genomic RNA and HaSV 
was determined by observing the ability of polynucleotide kinase to 
phosphotylate viral RNA with and without preincubation with tobacco 
add pyrophosphatase and alkaline phosphatase (Promega) under 
5 conditions described by the supplier. 



F /nvxero translation of HaSV RNA. //i vino translation of HaSV RNA 
was performed with lysates of both rabbit reticulocytes and ^eat germ 

10 (Promega) as directed by the supplier. Reactions were conducted in 10 

yl volimies with 1.0 pg of RNA in the presence of five u Ci 
methionine. Hie labelled proteins were resolved on 10% and 14% 
SDS-PAGE gels as described above then visualised by autoradiography 
of the dried gels. The two viral RNAs were separated by a 'freeze and 

15 squeeze" method after resolution on nondenaturing lowTnelting-point 

agarose gels in TAB (Sambrook, et al. 1989). Briefly, agarose sUces 
containing the RNA were melted at 65 * C in a volume of TAE buffer 
equal to six times the agarose volume. The solution was allowed to gel 
on ice before freezing at -80 C for 30 minutes. The frozen solution 

20 was thawed on ice then centrifuged at 14,500% for 10 minutes after 

^^ch the supernatant was withdrawn and precipitated by the addition 
of ethanol. 



25 G Bioassay of virus-induced pathogensis 

Known amoimts of virus isolate, as shown in Figure 4, were fed 
to larvae at the growth stages indicated by admixture to stadnard 
diet. At the time points shown, the larvae were weighed and the 
mean and SD calculated. Growth of infected larvae was 

30 compared to those of uninfected control populations from the 

same hatching batch in every experiment. 



SUBSTITUTE SHEET 



wo 94/04660 PCr/AU93/00411 

-23- 

Results 

i) Characteristics and taxonomy of HaSV 

The virus particles are isometric and are approsximately 36 - 38 nm in 
diameter. They are composed of two major capsid proteins, of 65 kDa and 
5 6kD is size. The virions contain two single-stranded (+) RNA species of 5,3 
kb and 2.4 kb length. The virus bears a similarity in these respects to the 
Nudaurelia o vunis, vMch has been tentatively regarded as a member of the 
Tetraviridae; these two viruses differ however, in the above respects from 
other viruses in this group and are likely to form a new virus family, sharing 
10 chiefly their capsid structure (T = 4) with the Tetraviridae. 

ii) Particle characterization and serology. 

The buoyant density of HaSV was calculated to be l,296g/ml in CsQ at pH 
7.2. The A26o'A280 °^ HaSV viral particles was 1,22 indicatmg a nucleic 

15 add content of 2q>praximately 7% (Gibbs and Harrison, (1976) Plant Vurology: 
The Principles London: Edward Arnold, Reciprocal immuno-double diffusion 
comparisons between HaSV and the Nudaurelia w virus showed no serological 
relationship. The more sensitive technique of immunoblotting also showed a 
complete lack of any antigenic relationship. In addition, HaSV did not react 

20 with antisera to the Nudaxurelia p virus in a immimo-diffusion test or ^en 
immimoblotted However, no Nudam-eha p virus was available as a positive 
control in these latter two immimological e3q>eriments. When HaSV was 
stained with acridine orange then irradiated with 31Qnm UV light, the particles 
fluoresced red wbidi indicated a single stranded genome. 

25 

iii) Protein charactarization. 

E5camination of the capsid proteins of HaSV with polyacrylamide gel 
electrophoresis in the presence of SDS showed variable results depending on 
the quantity of protein present. At low protein loadings, two proteins in major 
30 abundance were evident that had M/s of 65,000 and 6,000 along with a protein 
in minor abundance with Mj. of 72,000 (data not shown). When more protein 
was present on the gels, however, at least 12 more distinct bands with Mj.'s 
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ranging between 15,000 and 62,000 became evident. Probing the resolved and 
blotted proteins with antibodies monospecific for the major 65 kDa capsid 
protein showed all but two of the proteins shared conmion antigens with the 
major 65 kDa protein. The major 6 kDa cs^id protein and a minor band 
5 migrating at Mj. = 16,000 failed to react with both the monospecific antibodies 
and imtreated antisera. 

The capsid proteins were shown to be non-glycosylated as they failed to react 
10 with a hydrazine analog after oxidation with periodic add. The N-terminus of 
the 65 kDa protein appeared to be blocked in some manner as two efforts to 
conduct an Edman degradation failed. After the second attempt, the sample 
was treated witii n-chlorosucdnimide and shown to be in a quantity normally 
adequate for sequencing. The N-terminus of the 6 kDa protein, however, was 
15 not blocked as an unambiguous 16-residue sequence was readily obtained. 
The sequence of the N-tenninus of the 6 kDa capsid protein and those of a 
cyanogen bromide deaved fragment of the 65 kDa protein are as follows: 

6 kDa protein: 

20 PheAlaAlaAlaValSerAlaPheAlaAlaAsnMetLeuSerSerVall^uLysS 
65 kDa protein: 

ProThrl^uValAspGhiGlyPheTrpneGlyGlyGhiTyAlalxuThrl^^ 

25 Detailed sequence analysis of the RNA genome carried out in Example 3 

showed that RNA 1 encodes a protein of molecular weight 186,980 hereinafter 
referred to as P187 and RNA 2 encodes proteins with molecular weight 16, 
522 (called P17) and 70,670 (called P71). P71 is processed into two proteins of 
molecular weight 63,378 (called P64) and 7309 (called P7), 

30 iv) Nudeic add characterization 

The extracted nudeic add from HaSV was readily hydrolysed by RNAse A but 
not by DNAse I. Denaturing agarose gel electrophoresis of the extracted RNA 
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g nome of HaSV indicated two strands that migrated at 5^ kb and 2.4 kb. 
The RNA strands were shown not to have extensive regi ns of polyadenylation 
as only 24% of the viral RNA bound to the oligo-d(T) cellulose matrix as 
opposed to 82% of poly(A>selected RNA. Further evidence for the non- 
5 polyadenylation of the viral genome was provided by the observation that the 
oligo primer, d(T)i5G, gave a dear sequencing ladder using reverse 
transcriptase only after in vitro polyadenylation of the viral strands with 
poly(A)-potymerase. 

10 The demonstration that the strands could be modified with poly(A)-ix)lymerase 
also showed the lack of any 3* modification. TTie 5' termmi of the viral strands 
were shown to be capped, most likely with m^G(5')ppp(5')G, as they could not 
be labelled with polynucleotide kinase imless pretreated with tobacco add 
pyrophosphatase and alkaline phosphatase. 

15 

v) In viim translatioa 

In vitro translation of the viral RNA yielded different results m the two 
translation systems used (data not shown). The 5.5 kb RNA translated very 
poorly in the reticulocyte system ^ereas it produced in the wheatgerm system 

20 more than 20 proteins ranging in size fi:^om Mj. = 195,000 to Mj. = 12,000. The 
2.4 kb viral RNA strand yielded a major protein with an M^= 24,000 in both 
systems in addition to a minor protein at Mj.=70 kDa. A time course of the 
translation reaction with the 5.5 kb RNA strand showed all labelled proteins 
were produced at similar rates indicating that the smaller products did not 

25 arise through processing of the larger ones. However when a time course 

experiment was done with translation of the smaller 2.4 kb RNA strand, the 24 
kDa protein appeared before the 70 kDa protein. 

vi) Presence of another form of HaSV 

30 Frequentiy, during purification of HaSV virions, a minor band appeared in 
varying amounts on the CsCl gradient that had a buoyant density of 1.3 g/ml. 
On four occasions, v^^en particles from this minor band were used to infect H, 
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armigera larvae that were then processed as before for purification of HaS V 
virions, the HaSV band vridi a density of 1.296g/ml was again recovered m 
vast excess to a varying nunor amount of the more dense band. No virions of 
either type were recovered from uninfected control larvae. Proteins extracted 
5 from the more dense particles appeared identical to those from the less dense 
particles ^en examined by SDS-PAGE and immunoblotting with antibodies 
specific for the 65 kDa capsid protein of HaSV. Extraction and examination 
of the RNA genome with denaturing agarose gel electrophoresis also showed 
the same 5.5 and 2.4 kb bands. When particles fi:-om the more dense band 
10 were examined by electron microscopy as before, they appeared to have a 
larger diameter 45nm but otherwise hig^y similar to the 38nm particles. 

The molar ratio of the two RNA strands was determined by quantitative 
densitometry of fluorograms of the resolved strands. The ratio derived from 
15 an average of four measurements of various loadings on denaturing gels 

proved to be 1.7:1 (5.5 kb strand: 2.4 kb strand) ^ich is some\diat lower than 
the expected ratio of 23:1 for equimolar amounts of each strand 

The genome of HaSV has major differences that make it distinct from those of 
20 the nodaviruses, the only other group of bipartite small RNA viruses 

pathogenic to animals. Although HaSV shares the characteristic of a bipartite 
genome with the only animal viruses having sudi a divided genome, the 
nodaviridae, it differs m virtually every other aspect from this group. Both 
segments of its genome are considerably larger than the corresponding 
25 nodaviral RNAs (Hendry DA., (1991) Nodaviridae of Invertebrates, in (ed. E. 
Kurstak) Viruses of Invertebrates. Marcel Dekker, New York, pp. 227-276). 
However, the division of genetic labour is similar with the larger component 
carrying the replicase gene and the smaller one encoding the capsid proteins. 
Direct comparison of the sequences sho\^ litde homology between these 
30 viruses, at either RNA or protein level. The Nodaviruses, have the afready 
mentioned unusual 3'blockage (probably a protein), whereas the HaSV RNAs 
terminate in a distinctive secondary structure resembling a tRNA. 
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vii) Bioassays of idrus isolates on larvae 

Th original constructs made to egress the capsid proteins (precursor and 
processed forms) in E. oAiior bioassay started at the first AUG (nts 284 to 
286). Production of full-lengdi, immuno-reacrive protein bom these was due 
5 to these dones being the 5C sequence version with the extra C residue. 

Bioassays of these proteins have been difficult due to problems with obtaining 
suitable Hdiotfuslzrvac for the tests. 

EXAMPLE! 

10 OTHER VIRUS ISOLATES 

Materials and Methods 
A Yirns isolation 

Apparently infected (vi? diseased) larvae of HeLicoverpa spwcrc collected in 
February 1993 at MuUaley (NSW), Narrabri (NSW) and Toowoomba (QLD) 

15 (Australia). Referring to Fig. 10 the samples in wells 2A-2D were from 

parasitised H. omi^em larvae collected from sorghum at Mullaley; the sample 
in 6C was collected from sunflower at Toowoomba; the sample in 7D was 
collected from cotton at the Narrabri Research Station. The latter two larvae 
may have been either H. armigera or H. pimctigera, vMch are both easily 

20 infected with HaSV. 

B Virus RNA Extraction 

Larvae collected were ground up and RNA extracted. RNA extraction and 
piuifrcation were as per Example 1. 

25 

C Dot-Blot Northern Hybridization 

Extracts of viral RNA was analysed by Northern dot-blot hybridisation using a 
probe made from cloned HaSV sequences derived from 3'-terminal 1000 units 
of RNA 1 and RNA 2 by random priming in a Boehringer Mannheim kit 
30 according to the supplier's instructions were employed. RNA extracts were 
transferred to 2:eta-Probe (BioRad) for probing. Hybridization under high 
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stringency washing conditions were as spedfied by BioRad. Hybridizations 
were carried out in the following solution: 

1 mM EDTA, 500 mM HaH2P04, pH 12, 7% SDS, at 65 in a 
rotating Hybaid hybridization chamber. After completion of 
5 hybridization and removal of the solution containing the probe, 

the filters were washed twice in 1 mM EDTA, 40 mM HaH2P04 
pH 72^ 5% SDS, at 65'C (1 h each), foUowed by 2 washes in 1 
mM EDTA, 40 mM HaH2P04, pH 72 Wo SDS, at 65"C (1 h 
each), before autoradiography. 

10 

RESULTS 

Referring to Fig. 10, samples 9A, 9B, lOA, lOB and IOC contain HaSV 
infected positive control lab-raised larvae; 9C-H contain healthy (HaSV-free) 
negative control lab-raised larvae; All other wells (beginning 1-8) contain 
15 extract bom field-collected larvae. Numbers 2A-D, 6C and 7D gave positive 
signals indicating that these isolates are eitiier the same as HaSV or 
derivatives or variants thereof. Election microscopy employing (-) staining 
confirmed that die samples vrfiich gave positive signals contained abimdant 
icosohedral virus particles of approximately 36ixmi in size. 

20 

The presence of HaSV in larvae ^ch had tested positive in the Northern 
hybridization dot-blot was confirmed by Western blotting of crude extracts 
from such infected larvae, using the polyclonal antibody to the HaSV capsid 
protein. For routine screening of such extracts in order to identify further 
25 isolates of HaSV or to confirm the presence of the virus, use of a monoclonal 
antibody or its equivalent is preferable, in order to adiieve (i) higher sensitivity 
of detection and (ii) greater specificity of detection. 



30 
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E?CAMPLE3 

IDENIlFICATIONJSOIATIONAr© CHARACI^ 

VIRUS GENES 

Materials and Methods 
S A Animals and liros pixMluction. 

if. omi^em larvae were raised as described in Example 1. 

B Protein characterization 

Was conducted as described in Example 1. 

10 

C Naddc add characterization 
Was conducted as in Example 1. 

D Fractionation of virus RNA 

15 Hie two viral RNAs were separated by a "freeze and squeeze" method after 
resolution on nondenaturing low melting point agarose gels in TAE 
(Sambrook, et al, 1989). Briefly, agarose slices containing the RNA were 
melted at 65 • C in a voliune of TAE buffer equal to six times the agarose 
volume. The solution was allowed to gel on ice before freezing it at -80 * C 

20 for 30 minutes. The frozen solution was thawed on ice then centrifuged at 
14,500g for 10 minutes after vMdi the supematent was withdrawn and 
predpitated by the addition of ethanol. 

£ /n v&TD translation of HaSV RNA 
25 Was as in Example 1. 

F dDNA synthesis and doning of virus graome 
The virus RNAs were reverse transcribed into cDNA using the Superscript 
RTase (a modified form of the Moloney murine leukaemia virus (MMLV) 
30 RTase, produced by Life Technologies Inc). OUgo(dT) was used as a primer 
on RNA whidi had been polyadenylated in vUra After size selection of DNA 
fragments over 1 kbp in length, the cDNA was then blunt-end ligated using T4 
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DNA ligase (Boehringer Mannheim or Promega, under conditions described by 
die siq>pliere) into vector pBSSK(-) (Stratagene) ^ch had been cut vAth 
EcoRV and dephosphorjdated witii calf intestinal alkaline phosphatase 
(Boehringer Mannheim). E.coh strain JM109 or JPAlOl were electroporated 
5 with the ligation mixture and vMie colonies selected on colour-indicator plates 
Sambrook eLal 1989. 

For some dones of RNA2, cDNA was synthesised using the RTase of AMV 
(Promega) and a specific primer complementary to nucleotide sequence 2285 - 
10 2301 of RNA 2. The same buffer and conditions were used for the Superscript 
RTase (above). The AMV RTase was f oimd not to make cDNA form a 
primer annealing to the terminal 18 nucleotide sequence (see below), nor to 
be able to reach the 5'-end of the RNA with the primer here described. 

15 G Sequencing of DNA and RNA. 

The cDNA clones were separated as single-stranded or double-stranded DNA, 
using the deaza-dGTP and deaza-dTTP nucleotide analogues (Pharmacia) in 
the deaza T7 sequencing kit as recommended by this siq>plier. Synthetic 
oligonucleotides were used as primers. The 5' terminal sequences of the two 

20 RNAs were determined using reverse transcriptase to sequence the RNA 
template directiy, from specific oligonucleotide primers located about 200 
nucleotides downstream from the termini. Such RNA sequencing was 
performed using the reverse transcriptase sequencing kit from Promega, under 
the conditions described by the manufacturer. 

25 

The sequence of the 20 or so nucleotides at the 5* terminus of each RNA was 
checked using direct RNase digestion of SMabelled RNA under conditions 
designed to confer sequence-spedficity. Direct RNA sequence using RNases 
was performed with the RNase sequencing kit from US Biochemicals, 
30 following the protocols provided by the manufacturer. This also confirmed 
that the sequence of the most abundant RNA is consistent with that of the 
RNA analysed using the specific primer and RTase. 
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All traiiscription of plasmids linearized as described were performed as 
recoramended by the suppliers of SP6 RNA polymerase, in the presence of 
ImM cap analogue, 02mM GTP, and OSwM of the other NTPs. 

5 H Subcloning and expression 
PCR anyAfication 

The polymerase chain reaction (PCR) was used to obtain sequences covering 
vmis genes in a form suitable for cloning into ea^ression vectors. The reaction 
was performed with Taq DNA polymerase (Promega) as described by the 

10 supplier, in a rapid cyding thermal sequencer manufactured by Corbett 

Research (Sydney, Australia). A typical reaction involved 1 cycle of 1 min at 
90 *C, 25 cydes of 95 '^C (10 sec), 50 (20 sec), 72 (13 min), foUowed by 
one cyde of 72 for 5 min. Templates were generally cDNA or cDNA 
dones derived from HaSV RNAs, made as described below. Primers were as 

15 described below for the relevant constructs. 

Upon termination of the PCR reaction, the product's ends were made blimt by 
treatment with E.coli DNA polymerase I (Klenow fragment) at ambient 
temperature for 15 minutes. After heating at 65 • C for 10 minutes, the 
20 reaction was cooled on ice and the reaction mix made ImM in ATP. The 
product then 5'-phosphoryiated using 5 units of T4 polynudeotide kinase at 
37' C for 30 minutes. After heating at 65' C for 10 minutes, the product was 
run on a 1% lowmelting agarose gd and purified as described for RNA in 
section E above. 

25 

ligations: Vectors and restriction fragments cut with the enzymes described 
were run on 1% lowmelting-point agarose gels and excised as slices. These 
slices were then melted at 65 C for 5 minutes, before cooling to 37 " C. 
i^ragment ana veciors were tucu u^ulcu m iwui luioi volume at x-t ^ 
30 overnight usmg T4DNA ligase (BRL, Boehringer Maimheim or Promega), in 
the buffers supplied by the manufacturers. 
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esg^msnon: Expression plasmids containing viral genes (e.g. for the capsid 
protein) were transformed into E. orfistrain BL21 (DE3) r HMS174 (DEB) 
(supplied by Novagen). After growth as specified by the supplier, protein 
ejcpression yms induced by the addition of isopropyl p-D-thiogalactopyranoside 
5 (IPTG), at 0.4 nM to the growing culture for a period of 3h. Expressed 

proteins were analysed by SDS-polyacr^amide gel electrophoresis of bacterial 
extracts (Laenunli, 1970). 

Results 

10 1) Mapping cDNA denes of HaSY 

The template for cDNA synthesis was virus RNA wbich had been 
polyadenylated in vitro. Oligo(dT) was used as a primer for the Superscript 
reverse transcriptase (RTase; a modified form of the Moloney murine 
leukaemia virus (MMLV) RTase, produced by Ufe Technologies Inc). The 

15 cDNA was cloned into vector pBSSK(-) as described earlier. The larger clones 
were selected for further analysis by restriction mapping and Northern 
hybridization. All the probes tested hybridized either to RNA 1 or to RNA 2, 
suggesting that there are no regions of extensive sequence homology between 
the two RNA*s. Furthermore, screening of a niunber of other dones excluded 

20 the theoretical possibility that either RNA band may actually contain more 
than one spedes. 

ii) RNA 1 clones 

Three large RNAl dones (BllU, BllO and B35) obtained for the first round 
25 of dones were further analysed by restriction mapping and shown to form an 
overlap spanning over 3 kbp (this was later confirmed by sequencing). The 
second round of doning then yielded E3 of 53 kbp, representing 99.7% of 
RNA 1. A complete restriction map of done E3 showed it to align with that 
previously determined for three overlapping clones. On the basis of this 
30 alignment, the 5' end of the insert in BllU was placed about 300 nudeotides 
downstream from the 5' end of the RNA. 
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Once dones covering a contiguous block had been identified, th orientation 
Srelative to the RNA was determined 

iii) RNA 2 clones 

5 Three significant cDNA dones were isolated for RNA 2 (Fig- 2). One, hr236, 
contains about 88% of RNA 2 (2470 bp total length), and runs from the 3' end 
to 240 bp fitjm the 5' end. The other dones, hr247 and hr 249 are 3' 
coterminal subgenomic fragments of 1520 bp and 760 bp, respectively. 
Orientation of done hr236 was determined by strand specific hybridization. 
10 While a much stronger signal was seen with a probe for one orientation, the 
probe specific for the other orientation also yielded a signal, indicating that 
there are extensive regions of reverse complementarity within the positive 
strand sequence. Sudi sequences are likely to form extensive short and long- 
range secondary structure. 

15 

The dones contain the 3' sequence of HaSV RNA 2 as they all have the same 
3* sequence adjacent to the poly (A) stretch added in vftro before cDNA 
priming. The remaining 5' sequence of RNA 2 has been obtained by direct 
RNA sequencing using two reverse transcriptases as described above. 

20 

iv) Seqaencingof virus genome 

The dones ms^ped in section (i) were selected for further analysis by 
sequencing. 

25 The cDNA dones were completely sequenced as single-stranded DNA in both 
orientations, using the deaza-dGTP and deaza-dlTP nudeotide analogues 
(Pharmada) and synthetic oligonudeotides as primers. 

v) Sequence of genome component 1 (see Figure 1) 

30 The 5310 nudeotides of RNA 1 encode a protein of molecular weight 187,000 
yMch is regarded as the RNA-dependent RNA polymerase (replicase) in view 
of its amino add sequence similarity in certain limited regions to replicases of 
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other RNA viruses. The apparent molecular weight of this protein upon in 
vitro translation of virus RNA and SDS-PAGE is 195,000. 

Sequence analysis of RNA 1 was concentrated on done E3 ^ch extends from 
5 the 3' end of RNA 1 to 18 nucleotides form the 5' end (Figure 1). The 
complete sequence has been confirmed by sequencing in both directions. An 
ORF of 1750 amino adds and spanning virtually the complete RNA (5310 
nudeotides in lengdi) has been detected This ORF begim with the first AUG 
on the sequence at position 34 and terminates at nudeotide 5290 and is 
10 thought to encode the RNA-dependent RNA polymerase (replicase)(referred 
to as P187 in Fig. 1) required for virus replication, since it contains the Gly- 
Asp-Asp conserved triplet and surrounding sequences identified in these 
enzymes, vAndi are usually large (over 100 kDa), in addition to further 
homology with the polymerase encoded by tobacco mosaic virus and other 
15 plus-stranded RNA viruses. 



Referring to Fig. 1 the sequence is presented as the upper strand of the dDNA 
sequence. This strand is therefore in the same sense as the viral (positive- 
sense) RNA. The sequence of the protein encoded by the major open reading 
20 frame, encoding the putative RNA-dependent RNA replicase, is shown, as are 
those of the small open reading frames at the 3' end, corresponding to the 
proteins Plla, Pllb and P14. 

Clone E3 was inserted downstream of the SP6 promoter for in vitro 
25 transcription. As mentioned above, the transcript of this clone can be 

translated in the ^eat germ system to yield the 195 kDa protein observed 
upon translation of fractionated RNA 1 from the virus. The latter yields more 
lower molecular weight products, presxunably due to being contaminated with 
nicked and degraded RNA. The products derived from the in vitro transcript 
30 can therefore be regarded as defining the coding capacity of the complete 
RNA 1 of HaSV. 
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vi) Sequence of goiome componoit 2 (see Figure 2) 

The 2470 nucleotides encode a protein of molecular weigjit 71,000 windi 
contains the peptide sequences correspotuiing to those determined from the 
two virus capsid proteins. This jM-otein is therefore the precursor of these 

5 capsid proteins. The protein is a major product of in vitro translation of this 
RNA obtained either from virus particles or by in vitro transcription of a full- 
length cDNA done; in addition, another major translation product of 
apparent molecular weight 24,000 is obtained. This protem is derived from a 
molecular weigjit 17,000 reading frame overlappling the slab of the capsid 

10 protein gene. 

Clones hr236 and hr247 were completely sequenced as the first step in RNA 2 
sequencing. These sequences were then extensively compared to that obtained 
by direct RNA sequencing using AMV reverse transcriptase. 

15 

Comparison of the doned sequence with that by direct RNA sequencing 
showed both dones lacked 50 nudeotide present in the RNA (at around 
nudeotide 1500). The sequence of this stretch was obtained by direct RNA 
sequencing using the AMV RTase. The MMLV "Superscript" RTase, ^di 
20 was used to make all the cDNA dones, was foimd to simply bypass this region 
in sequencing reactions. These 50 nudeotides contain a very stable GC-rich 
hahpin flanked by a 6 bp direct repeat, and the MMLV RTase skips from the 
first repeat to the second. 

25 The sequence of RNA 2 was then completed using plasmids pSR2A and 

pSR2P70 constructed as described below. The plasmids contain a segment of 
cDNA derived for the AMV RTase, as well as the sequence corresponding to 
the 5' 240 nudeotides of RNA 2 vMch are not present on phr236 (Fig. 2). 
The sequence of RNA in Fig. 2 is presented as the upper strand of the cDNA 

30 sequence. This strand is therefore in the same sense as the viral (positive- 
sense) RNA The sequences of the proteins encoded by the major open 
reading frames, encoding the capsid protein precursor P71, and P17. 
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Hie sequ nee of RNA 2 encodes a major ORF running from a methionine 
initiation codon at nucleotides 366 to 368 to a termination codon at 
nucleotides 2307 to 2309. This protem encoded by this ORF has a theoretical 
molecular weigjit of 71,000. This initiation codon is in a good context 
5 (AGGatgG), suggesting that it will be well recognized by scanning ribosomes. 
The size of the product is dose to that of the residual putative precursor 
protein identified in purified vuus, and to the size of the in vitro translation 
product obtained from RNA 2. 

10 The approach adopted to identify the gene encoding the capsid protein was to 
obtain amino add sequence information from the two abimdant c^id 
proteins and then locate these on the protein encoded by the sequence of the 
virus RNA's. CNBr deaved products of the capsid protein were therefore 
sequenced. These fragments gave a dear and unambiguous sequence shown in 

15 Example 1. These sequences determined were then located on the large ORF 
of RNA 2. (Figure 2) 

In the case of the small capsid protein, the dear and unambiguous sequence, 
obtained is located near the carboxy terminus of the major ORF on RNA 2. 
20 Starting at the point corresponding to the amino-tenninal residue of the 
sequence determined for the 6 kDa protein, and continuing to the carboxy- 
terminus of the complete reading frame, the protein encoded by the sequence 
7.2 kDa and has a hydrophobic N-terminal region and an arginine rich (basic) 
C-terminal region. It is an extremely basic protein with a pi of 12.6. 

25 

The two abundant capsid protems are derived from a single precursor, which is 
processed at a specific site. This is presumably immediately amino-terminal to 
the sequence FAAAVS.... 

30 RNA 2 appears to be a bidstronic mRNA (see Figs. 2 and 5). The first 

methionine codon is encoded on the sequence of RNA at nudeotides 283 to 
285. This ATG is in a poor context (TTTatgA), making it a weaker initiation 
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codon. It initiates a reading frame of 157 amino adds, encoding a protein of 
molecular weight 17,000. (The second AUG [nts 366 to 368] initiates the 71 
kDa precursor of the capsid protein). Since the first AUG is in a poor context, 
abundant e3q>ression of the capsid precursor would be expected. In fact, in 
5 vitro translation of a full length RNA 2 transcribed from a reconstructed 
cDNA done ^elds two major protein products of relative mobility 71,000 and 
24,000, similar to those ahready observed upon translation of viral RNA 2. The 
protein of Mr 24,000 spears to correspond to the 157 amino add protein, 
despite the significant anomaly in apparent size. Tlie 24,000 Mr product was 
10 also observed upon translation of an in vitro transcript covering only 

nudeotides 220 to 1200 of RNA 2. This region contains no open reading 
frame other than those already mentioned and cannot encode a protein longer 
than 1S7 amino adds. 

15 The protein of Mr 24,000 seen upon in vitro translation appears to correspond 
to P17, with the anomaly in £^parent size probably being due to the higih 
content of proline (P), glutamate (EX serine (S) and threonine (T). These 
amino adds cause the protein run more sloi^y on a gel thereby giving it an 
apparent size of Mr 24,000. 

20 

The Mr 24,000 protein (hereinafter referred to as P17) may have a function in 
modifying or manipulating the growth characteristics or cell cyde of HaSV- 
infected cells. Although a protein of 16kDa (identified in Example 1) is foimd 
in small amounts in the c^id, it does not react with antiserum against the 
25 virus partides this is unlikely to correspond to P17, since a preparation of the 
latter proteins migrates with a molecular weigjht of 24,000 on SDS gels. 

Sequence analysis of the Region from nudeotide 500 to 600 of RNA 2 showed 
that it has the sequence shown in Fig. 2, as do the plasmids pSR.2A, pSR2P7G, 
30 pSR2B and pSXR2P70. However, plasmids pT7T72P65 and pT7T2P70 have 
an extra C residue at nudeotide 570. The RNA sequence from vAich they are 
derived from is shown in Fig. 2 (the -5C' version). In this sequence the first 
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ATG (nucleotides 283 to 285) is in the same reading frame as most of the 
capsid protein gpne. The resultant fusion protein is called T70" and its 
carboo^erminal-truncated version (a variant of the native P64) is "P65". In 
view of these clones it was considered important to resolve Aether any virus 
5 RNA carrying the extra C residue was present in the viral RNA population 
first isolated for investigation. 

Direct sequencing of the virus RNA using reverse transcriptase confirmed that 
the 4C version lacking the extra residue was the abundant form of the RNA. 

10 In order to exclude the possibility of a small amount of the RNA having the 
extra residue, a sensitive PGR assay was designed. This showed that the extra 
C residue was not present on any RNA in the viral population, and had been 
introduced into some dones as a PGR artefact These clones were however 
retained and used in bacterial es^ression e)q)eriments (below) because of the 

15 high level egression obtained of the P65 and P70 fusion proteins. 

vii) Comparison with the sequmce of the Nudaurdm u capsid gene 

The sequence of most of the RNA2 of the Nudaurdia <o virus has recently 
20 been published by Agrawal D JL and Johnson J£. (\%ology 190 806-814, 

1992). From the published sequence it has been determined that this sequence 
shows 63% homology to that of HaSV RNA2 at the nucleotide level and 66% 
at the overall amino arid level. A detailed comparison of the capsid proteins 
of these two viruses shows the amino-terminal 45 residues to be variable, the 
25 next 220 residues to be highly conserved, the next 180 residues to be variable 
and the c-terminal 200 residues covering the small protein P7 to be highly 
conserved, A more detailed comparison is discussed below. 

The published report did not find a complete reading frame corresponding to 
30 the 157 amino add protein (PI 7) gene reported above. The AUG is however 
present, as is a reading frame - starting upstream of the start of the capsid 
gene - showing considerable amino acid homology to PI 7 of HaSV. In vitro 



I SUBSTITUTE SHEET | 



wo 94/04660 PCr/AU93/0041 1 

-39- 

transladon of purified Nudaurdia o virus RNA 2 and a re-examination of the 
nucleotide sequencing data for this RNA may help to resolve the question of 
Aether the Nudaurdia a> virus also encodes a protein homologous to die 
HaSV P17. 

5 

More interestingly, antisera against these two viruses, vMch are similar at a 
nucleotide sequence level, do not show any cross-reactivity. 



10 viii) Cdnstruction of full-lraigch clones 
RNAl 

cDNA done E3, described above contains all but die 5'-18 nucleotides of RNA 
1 and included the complete ORF present on the sequence. The first full- 
lengfli done of RNA 1 is therefore based on E3, The 4.9 kbp Xbal-Qal 
15 fi^igment from done E3 was redoned into pBSKS(-) (Stratagene) cut with 
Xbal and Clal, giving pBSKSES. 

The full-length done of RNA 1 was completed using PGR. The primer 
defining the 5' end of the RNA carried an EcoRI site, the promoter for the 
20 SP6 RNA polymerase and a sequence corresponding to the 5* 17 nucleotides of 
RNA 1, as shown in Figure 1. The sequence of this primer was: 
HvRlSPSp: 

S'-GGGGGGAATTCATITAGGTGACACTATAaTrCTGCCrCCCCGGAC 
(The G which initiates transcription is underlined) 

25 Using an oligonudeotide complementary to nudeotides 1192 - 1212, a PGR 
product of 1240 bp was effidentiy made. The template was cDNA synthesised 
using the MMLV RTase and the same oligonudeotide complementary to 
nudeotides 1192 - 1212 was the primer. Upon termination of the PGR 
reaction, the product's ends were made blunt and then 5'-phophoryiated as 

30 described below. The purified PGR fragment was tiien deaved with restriction 
endonudease Xbal and the 450 bp subfragment corresponding to the 5' end of 
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RNA 1 cloned into the plasmid pBSSK(-XStragene) cut with EcoRV and 
Xbal, to give pBSRlS, 

To assemble the full-length of RNA 1, pBSKSES (above) was cut with Xbal 
5 and Seal giving fragments of 12 kbp and 6.8 kbp. pBSRlS was cut with the 
same enzymes, giving fragments of 2 and 1*8 Id^p. Ligation of the 6.8 kbp 
fragment for pBSKSES and the 1.8 kbp fragment for mpBSRlS yielded 
pSRl(E3)A. Upon linearization at Qal and in vUm transcription with the SP6 
RNA polymerase, and RNA correspondmg to RNA 1, and terminating in a 
10 poly(A) stretch of about 50 nucleotides, is obtained. 

Since the natural RNA 1 does not have a poly (A) tail, an alternative plasmid 
was constructed vAdch carries a BamHI restriction site immediately 
downstream of the 3'end of RNA 1. Again this terminal fragment was made 
15 using PGR as above. The sequence of the primer was as follows: 

HvR13p: 5'-GGGGGGATCCTQGTATCCCAGGGGCGC (the nucleotide 
complementary to that ^^ch was determined as the 3' one, based on its 
adjacency to the poly(A) stretch, is underlined; RNA terminating at the 
BamHI site will have the sequence GCGCCCCCUGGGAUACCaggauc). 

20 

The template was done E3 and an oligonucleotide corresponding to 
nucleotides 4084 - 4100 was the other primer. The 1220 bp product was blunt- 
ended, kinased and gel-purified as described above, before cleavage with 
HindUL The resulting 420 bp subfragment corresponding to the 3' end of 
25 RNA 1 cloned into plasmid pSRl(E3)A cut with Qal, end-filled with Klenow 
and then cut with Hindlll. The resulting plasmid is pSRl(E3)B. Upon 
linearization at BamHI and in vitro transcription with the SP6 RNA 
polymerase, and RNA corresponding to RNA 1, and terminating as described 
immediately above is obtained. 

30 
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ix) RNA2 

In constructing the full-length cDNA done to enable in vitro transcription of 
this RNA hr236 described above was used as a basis. Two separate PCR 
products, one corresponding to the 5' portion of RNA 2, -which is missing from 
5 this done altogether, and another covering the region \rtiere done hr236 lacks 
the hairpin-forming sequence described above, were required. 

The primer defining the 5' end of the RNA carried a Hindlll site and a 
sequence corresponding to the 5* 18 nudeotides of RNA 2, as shown in Figure 
10 2. The sequence of this primer was: 

Hr2cdna5: S'-CCGGAAGCITGrrrriUri'lCl 1 lACCA 

(The nudeotide underlined corresponds to that identified as the first 

nucleotide of RNA 2.) 

Using an oligonudeotide complementary to nudeotides 1653 - 1669, a PCR 
15 product of 1.67 kbp was made. The template was cDNA synthesised using the 
MMLV RTase and an oligonudeotide complementary to the 18 nudeotides at 
the 3' end of RNA 2 as the primer. Upon termination of the PGR reaction, 
the product was blunt-ended, kinased and gel-purified as described above, 
before deavage with Pstl. The resulting 13 kbp subfragment corresponding to 
20 the 5' half of RNA 2 was doned into plasmid pBSSK(-) (Stragene) cut witii 
EcoRV and Pstl, givmg plasmid pBSR25p. In order to place tiiis subfragment 
corresponding to the 5* half of RNA 2 downstream of the SP6 promoter for in 
vitro transcription, a 13 ld>p Hindlll - BamHI fragment was erased from 
pBSR25p and ligated into HindlH - BamHI cut pGEM-1 (Promega), giving 
25 plasmid pSR25. 

The second PCR product, covering the region ^ere done hr236 lacks the 
hairpin-forming sequence described above, was synthesised using as primers 
oligonudeotides corresponding to nudeotide sequence 873 to 889 of RNA 2 
30 and to the complement of nudeotide sequence 2290 - 2309. Upon termination 
of the PCR reaction, the product was blunt-ended, kmased and gel-purified as 
described above, before deavage with Aatll. The resulting 1.1 kbp 
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subfragment covering the required region was cloned into plasmid phr236 cut 
with Hmdm, end-filled with Klenow and cut with Aatll, giving plasmid 
phr236P70. 

5 The two segments were joined covering the first 230 nucleotides of RNA 2 
together. Plasmid phr236P70 was cut at the SacI site in the vector adjacent to 
the 5' end of the insert and this made blunt-ended using Klenow in the 
jibsence of dNTPs. After heat-inactivation of the Klenow, the plasmid was cut 
with EcoRI, yielding fi-agments of 4.5 kbp and 380 bp. Plasmid pSR25 was cut 
10 with Nhel, blunt-ended by end-filling with Klenow and cut with EcoRI, yielding 
fragments of 2.8 Kbp, 900 bp and 750 bp. The 4.5 kbp fragment of phr236P70 
and the 900 bp fragment of pSR25 were ligated to give pSR2P70. This done 
covers all of RNA 2 except for the 3' 169 nudeoddes, 

15 To complete the full-length done of RNA 2, it was necessary to insert a 

fragment covering the 3' end. As with RNA 1, two versions were made. One, 
called pSR2A, used the 3' end as present in phr236, together with the poly(A) 
tail present in this version. The other pSR2B, used a PCR firagment carrying a 
BamHI site immediately downstream of the 3' nudeotide, as in pSRl(E3)B 

20 above. To construct pSR2A, a 350 bp Notl-Qal fragment was excised from 
phr236 and doned into pSR2P70 cut with the same endonudeases. 
Linearization at the unique Qal site allows in vitro transcription of the 
complete RNA 2 and a poly(A) tail of about 50 nudeotides in length. 

25 To make pSR2B, an q3propriate PCR product was made using as primers an 
oligonudeotide corresponding to nucleotide sequence 1178 to 1194 and to the 
3' terminal 18 nudeotides of RNA 2. The latter primer carried a BamHII site 
attached, giving it the sequence: 

HvR23p: 5'-GGGGGATCCGATGGTATCCCGAGGGACGC 

30 

The template used was a plasmid phr236. Upon termination of the PCR 
reaction, the product was blunt-ended, kinased and gel-purified as described 
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above, before deavage with N tl. The resulting 400 bp subfragment covering 
the required regi n was doned into plasmid pSR2P70 cut with Clal, end-filled 
with Klenow and cut with NotI, ghdng plasmid pSRP2B. Linearization at the 
unique BamHI site allows in vUro transcription of the complete RNA 2, 
5 terminating with the sequence ACCaggatc. 

x) Construction of pSXR2P70 

This plasmid was made to determine vAiexc p24 starts. A 2.1 kbp Xhol- 
BamHI fragment was cut from done pSR2P70 and ligated into the vector 
10 pGEM-1 (Promega) ^ch had been cut with Sail and BamHI. In vitro 

transcription of the resulting plasmid after linearization at the unique BamHI 
site yidded an RNA covering about 70 nudeotides upstream of the first ATG 
at nudeotides 283 to 286, plus a short sequence derived from the vector. 

15 In vitro translation of the RNA from pSXR2P70 yielded both proteins (P70 + 
P24). 

xi) Description of vimsnnduced pathology 

The virus induces a rs^id anti-feeding effea in Helicoverpa larvae as 
20 determined by esqieriments with larvae the results of vAnxAi are shown in Fig. 
3. Fig. 3 shows: A neonate larvae (less than 24 h old) were fed the designated 
concentrations of isolated virus (in partides per ml [of diet] added to solid 
diet). They were weighed on following days and the mean of a statistically 
significant number (24) of larvae shown. Where necessary, mortality was 
25 recorded for the higher concentrations. The vertical axis shows the fold- 
increase in wei^t from the hatching weight of 0.1 mg per larvae. This scale 
therefore also corresponds to weight in units of 0.1 mg (ie 300 is equivalent to 
30 mg). B. As for A, but the larvae were 5 days old at the start of the virus 
feeding. The vertical scale is in mg wei^t. 

30 

No weight gain at all was detectable with neonates Miiich had been fed the 
doses of virus over 10^ particles per ml (virus added to diet). In addition. 
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100% mortality was evident after four days at the highest doses. Virus doses 
as low as 10* particles per ml (virus added to diet) still cause significant 
stunting. The five day old larvae showed a cessation of feeding after 48 hours 
and significant stunting at 4 dpi, but no mortality at comparable virus doses 
5 (Figure 3). Neonates are therefore very sensitive indeed to this virus. Virus 
particles accumulate specifically in the midgut This potent anti-feeding effect 
may be due to the capsid protein or anotiier protein encoded by the virus, or 
to the effect of any combination of such proteins. 

10 xii) Expression of virus-encoded proteins in bacteria. 
The vectors 

The egression system used initially was derived from the pET-11 system 
(Novagen). Trimmed down versions of pET-llb and c were constructed and 
\ised to compare esqpression of the a^id proteins. However, due to difficulties 
15 eaperienced with this system substantial modification of the original vectors 
was carried out in order to achieve much higher yields. These results are 
described in xiii-b) below. 

The initial trimmed-down vectors discussed above were made as follows: 
20 pGEM-2 (Promega) ^ch carries T7 promoter adjacent to a poly-linker 

sequence, but has no sequences corresponding to the lac operon, was cut at the 
unique Xbal (34) and Seal (1651) sites, giving fragments of 1.61 and 1.25 kbp. 
The plasmids pET-llb and c were cut with the same enzymes, giving fragments 
of 4.77 and 0,91 kbp. The 1.61 kbp fragment of pGEM-2, carrying die c- 
25 terminal portion of the ampidllin-resistance gene, the origin of replication and 
the T7 promoter, was then ligated to the 0.91 kbp fragment of the pET vector, 
which carries a sequence covering the Shine-Dalgamo sequence, the ATG (in 
a Ndel site), the terminator for the T7 polymerase and the N-terminal portion 
of the ampidllin-resistance gene. The resulting plasmids of approximately 2.53 
30 kbp, called pT7T2-b and c, therefore carry a complete T7 transcription unit, 
^rfiich may be used as an e^q^ression system in a manner similar to the original 
pET-11 plasmids, but are repressor-neutral witi±i the cell; they neither titrate 
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away repressor by carrying a binding site, nor do they carry the gene producing 
the repressor. They were found to grow very well in E.coli strains JM109 and 
BL21 (DE3), and to be very efficient e3q)ression vectors. The repressor 
present in the ceils was found to be sufficient to keep the genomic T7 
5 polymerase gene uninduced and therefore the foreign gene unexpressed in the 
absence of IPTG. 

xiii-a) Cdnstruction of plasmids for expression of capsid proteins 
In this section, all proteins espressed from segments of HaSV RNA 2 are 
10 referred to by the size of then- gene, as defined in Hg. 4 and in section vi) of 
this example. The following plasmids were constructed by PGR, using the 
abovementioned full-length done of RNA 2, plasmid pSR2A as the template, 
except ^ere mentioned otherwise. 

15 Groups of plasmids expressed protein starting at each of the first three 
methionine initiation codons found on the sequence of HaSV RNA 2. For 
those proteins initiating at the first methionine initiation codon found on the 
sequence of HaSV RNA 2 (^fcich mitiates the P17 gene; oligonucleotide 
primer HVPET65N), an extra group of plasmids was made by PGR using as a 

20 template the version of the RNA 2 sequence carrying an extra G residue 
inserted at residue 570 (as depicted in Figure 2). E3q>ression constructs 
initiating at the third methionine initiation codon found on the sequence of 
HaSV RNA 2 (vMcix is located within the P17 gene; oligonucleotide primer 
HVPET63N) were made by PGR using as a template only the verion of the 

25 RNA 2 sequence carrying an extra G residue inserted at residue 570, For 
these latter expression constructs, as well as those designed to initiate 
egression fi'om the second methionine initiation codon found on the sequence 
of HaSV RNA 2 (^ch initiates tiie P71 gene; oligonucleotide primer 
HVPET64N), two versions were constructed 

30 

One version terminated at a point corresponding to the c-terminus of the 
processed (P64) form of the capsid protein and was made using 
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oligonucleotide primer HVP65C. The ther version terminated at a point 
corresponding to the c-terminus of the precursor (P71) form of the capsid 
protein and was made using oligonucleotide primer HVP6C2. 

5 The sequence encoding P64 (or the precursor, P71) was synthesised in two 
segments using PGR, The amino-terminal half of the gene was obtained usmg 
as primers oligonucleotides incorporating one of the three ATG possible 
initiation codons for the ORF, in addition to an oligonucleotide with the 
sequence TCAGCAGGTGGCATAGG; complementary to nucleotides 1653 to 
10 1669 of the sequence shown in Fig. 2. Hie forward primers were as follows: 
HVPET65N: 

AAATAATirrGTTTACTTTAGAAGGAGATATACATATGAGCGAGCGA 
GCACAC 

(the underlined sequence corresponds to nucleotides 283 to 296 of the 
15 sequence shown in Figure 2) 

HVPET63N 

AAATAATmGTITAACCTr>4AGAAGGAGAT 
GCGTCAC 

20 (the underlined sequence corresponds to nucleotides 373 to 390 of the 

sequence shown in Figure 2; the Aflll (CTTAAG) and BgUI (AGATCT) sites 
introduced into the sequence by single nucleotide changes (shown in talia^ in 
the oligonucleotide are shown in bold). 

25 HVPET64N 

nrwAOATrTACA TATnGGAGATGCTGGAGTG 
(the underlined sequence corresponds to nucleotides 366 to 383 of the 
sequence shown in Figure 2; the Bglll site introduced into the sequence by a 
single nucleotide change in the oligonucleotide is shown in bold). 

30 

The PGR products obtained from each combination of one of these primers 
with the abovementioned one were treated with the Klenow fragment of Exoli 
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DNA polymerase, and then with T4 polynucleotide kinase in the presence of 1 
mM ATP, before purificadon by agarose gel electroph resis as described 
above. Each produa was dien cleaved with Aatll to yield fragments of 0.95 
and 0.4 kbp, and each resulting fragment of about .95 kbp cloned intro vector 

5 pGEM-2 (Promega) cut with HincII and Aatll, giving plasmids pGEMP63N (in 
which the insert commenced with oligonudeodde HVPET63N), pGEMP64N 
(in vMch the insert commenced with oligonucleotide HVPET64N) and 
pGemP65N (in wliich the insert commenced with oligpnucleotide 
HVPET65N). The fragment covering portion of the HaSV capsid gene was 

10 then excised with enigrmes AatU and XbaL 

Two versions of plasmid pGemP65N were made, using different templates as 
described above. pGemP6SN was derived from the sequence of the viral 
RNA, as in plasmid pSFZA; plasmid pGemP65Nc was derived from the 
15 sequence carrying an extra C residue, as shown in Fig. 2 (see "5C version**). 

In parallel, the caiboo^terminal halves of the major capsid protein variant, 
Aether terminating as for P64 or for P71, were also produced using PCR, An 
oligonucleotide primer, HVRNA2F3, with the sequence 
20 GTAGCGAACGTCGAGAA (corresponding to nucleotides 873 to 889 of the 
sequence shown in Figure 2) was used in conjunction with each of the two 
primers following: 

HVP65C 

25 rwnonnATrrT rArTTTnTTAGTGGCG nGnTAG 

(the underlined sequence is complementary to nucleotides 2072 to 2091 of the 
sequence shown in Hgure 2). 

HVP6C2 

30 nnnnATrr rTAATTGGr ArnAGrGGCGC 

(the underlined sequ nee is complementary to nucleotides 2290 to 2309 of the 
sequence shown in Figure 2). 
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The PCR products obtained from eadi combination of one of these pimers 
with the above menti ned one (HvRNA2F3) were treated with the Klenow 
fragment of £.arfiDNA polymerase, and then vatb T4 polynucleotide kinase in 
the presence of i mM ATP, brfore purification by agarose gel electrophoresis 

5 as described above. Each i»oduct was tiien deaved with Aatll to yield 
fragments of 0.9 kbp (in the case of HVP65C) or 1.1 kbp (in the case of 
HVP6C2) and 0.4 kbp, and each resulting fragment of about .9 or 1.1 kbp 
doned into plasmid phr236 cut with Hindm, treated with Klenow and Aatll, 
giving plasmids idir236P65C and phr236P70 (vAach has ab-eady been described 

10 above), respectively. The fragment covering the c-tenninus of the capsid 
protein gene was then excised with &azyme& Aatll and BamHI. 

To assonble plasmids for expression in suitable strains of E. caii the excised 
Xbal-Aatn fragments of 0.95 kbp cov«ing the amino-terminal half of the gene 
15 and the excised Aatll - BamHI fragments of 0.9 or 1.1 kbp covering the 
cartxn^terminal half of the gene were simultaneously ligated into the vector 
pT7T2 cut with Xbal and BamHI. Initial transformation was of E. orf/ strain 
JM109. Recombinant plasmids canying the correct insert were then 
transformed into strain BL21(DE3) for esqiression as described above. 

20 

The plasmid obtained by ligating the aminoterminal fragment commendng 
witii oligonudeotide primer HVPET63N to tiie c-tenninal fragment ending at 
oligonudeotide priemr HVP65C in the epxression vector pT7T2b was called 
pP65G. 

25 

In the case of plasmid pP64N, containing an insert from HVPET64N to 
HV65C, the fragment cxjvering the amino-terminal half of the oligonudeotide 
was excised by BgJII and Seal from the plasmid pGemP64N and the fragment 
covering the remainder of the gene was excised with Seal and EcoRI from 
30 plasmid pT7T2-P65. These two fragments were then ligated simultaneously 
into pP65G ^ch had been cut with Bglll sand EcoRI. 
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The resulting construct carrying the complete P71 precursor gen was called 
pT7T2-F71 and that carrying the P64 form of the gen was called pT7T2"P64. 
In the case of plasmids derived from pGemP65N and pGemP65Nc, carrying 
inserts commencing as defined by primer HVPET65N, the egression plasmid 
5 derived from pGemPfiSN ^ch is based on PGR products made using as the 
template the sequence of the viral RNA, as in plasmid pSR2A, was called 
pTP17; a truncated form of this plasmid, urtiidi expresses P17, was made by 
cutting at the unique Bg^I and BamHI sites, removing the intervening 
fragment (\idiich corresponds to the c-terminal part of the insert) and religating 
10 the compatible cohesive ends, to give pTP17delBB. The egression plasmids 
derived from plasmid pGemP65Nc (vMch was derived from the sequence 
carrying an extra C residue, were called pT7T2-P65 (carrying an insert 
terminating at the primer HVP65C) and pT7T2-P70 (carrying an insert 
terminating at the primer HVP6C2). 

15 

Expression of P6 

Two forms of this protein, vrtudh arises through processing of the large capsid 
protein variant preousor p70 and therefore lacks its own initiation codon, were 
made. One form (protein MA) replaced the phenylalanine at the start of this 
20 protein with methionine, giving it the amino-terminal sequence MAA...; the 
other carries an additional methionine residue, giving it the amino-tOTninal 
sequence MFAA... The oligonucleotides used for PCR-amplified products 
covering the p6 coding sequence carried a Ndel site (bold) at the ATG codon, 
for direct ligation into the pET-lI vectors. The primers used were: 

25 

HVP6MA: AATTACATATGGCGGCCGCCGTTrCTGCC 

HVP6MF: AATrACATATGTTCGCGGCCGCCGTITCT 

30 Each of these primers was used in conjunction with primer HVP6C2 to 
generate a PGR product of 02 kbp. These products were blunt-end ligated 
into vector pBSSK(-) v^^ch had been cut with EcoRV and dephosphorylated. 
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Th insert corresponding to the p6 gene was excised with Ndel and BamHI 
(usmg the BamHI site in the primer HVP6C2) and ligated into the expression 
vector pET-lIb, ^ch had been cut with the same em^es. For e^ession at 
hi^er levels, the insert was transferred to PT7T2 as a Xbal - BamHI 
5 fragment, yielding plasmids pTP6MA and pTP6MF, 

IPTG induction of bacteria containing plasmids pTP6MA or pTP6MF were 
used produce p6 for bioassay. 

10 xiii-b) Expression of viral gates in K oniiand bioassay in larvae 
Eiqpression of P64 

IPTG induction of bacteria containing plasmid pT7T2-P65, vMch contains an 
insert running from the location of primer HVPET65N to that of primer 
HVPdSC, yielded a protein of molecular weigtit 68 000. This was 3 000 
15 molecular wei^t greater than the size of the authentic coat protein, as 
expected. Espression of pP65G, vMdi contains an insert running from 
HVPET63N to HVP65C, yielded a protein of 65 000 molecular weight. 

The authentic C£^id protein (P64) was expressed poorly from plasmid pT7T2- 
20 P64. Redoning this insert as a Ndel-BamHI fragment back into the other 
form of the vector (PT7T2b) did not alter this. 

Expression of P70 

IPTG induction of bacteria containing plasmid pT7T2-P70, wbidx contains an 
25 insert running from the location of primer HVPET65N to that of primer 
HVP6C2, yielded a protein of molecular weight 73 000. This was 3 000 
molecular weight larger than the size of the precursor of the coat protein, as 
expected. 

30 The authentic capsid protein precursor (P71) was e^ressed poorly from 
plasmid pT7T2-P71. Redoning this insert as a Ndel-BamHI fragment back 
into the other form of the vector (pT7T2b) did not alter this. 
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Due to the observation menti ned in vi) above, plasmids designed to express 
all forms of the capsid proteins from several possibl ATG's at th start of the 
open reading frame were constructed 

5 It was found that both authentic P64 and P71 were expressed poorly in 

bacteria. In contrast, P17 and the forms of the capsid protein commencing at 
the P17 ATG were expressed very well The extra C residue present in the 
latter two constructs resulted in a fusion protein being made from these 
expression plasmid. The sequence of the fusion proteins can be derived from 

10 Fig. 2 by including an extra C at position 570. The fusion caused the first 67 
residues of the HaSV c^d protem to be replaced by the first 95 residues of 
P17. Good e^[iression of the large capsid precursor and protein was achieved, 
but the size of these proteins were above 3 kDa larger than the authentic 
forms. Notwithstanding this the expression products of the vectors containing 

15 the 5C variant of RNA 2 are still useful because the resulting product, a P70 
variant, is only modified at the NH2 terminus. Since this terminus is thought 
to be embedded in the capsid structure and therefore not to participate in the 
initial interaction with the larval midgut cell, the variant is still useful 

20 In order to produce constructs vMch ensure that the expressed proteins 
possessed the native amino tenninus, new plasmids carrying the correct 
sequence were then cloned into the egression vector (pT7T2). It was found 
these plasmids to eaqiress proteins of the correct size. 

25 The P6 has not yet been to expressed from the new constructs. No evidence 
has been found for processing of P70 to yield the mature proteins in bacteria, 
nor upon in vitro translation of synthetic full-length RNA 2. 

The P17 gene has also been doned into the same vectors for egression and 
30 bio-assay. This protein accumulates well in bacteria upon induction, and 
electron microscopy analysis has shown it form spectacular honeycomb-like 
structures under the bacterial cell wall, completely surrounding the cell interior 
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(results not shown). The properties of this protein including its amino add 
composition and ability to form tube-like structures vAx n expressed in bacteria 
suggest that it m^ be an homolog of a gap junction protein. The latter is 
involved in forming the channels linking the cytoplasms of adjacent epithelial 
5 cells in the insea gut P17 could then play a role in enlarge or forming these 
chaimels» thereby enabling cell-to-cell movement of the virus in the insect gut, 
analogous to the movement or spreading proteins encoded by plant RNA 
viruses. 

10 In order to ensure that the expressed proteins carried the native amino 

terminus the correct sequence has also been cloned into the expression vector 
(pT7T2). The vector had been very slightly modified to that described above 
to introduce two novel restriction sited (for Afm and Bgin) flanking the 
Shine-Dalgamo sequence. The resulting constructs have been found to be 

IS poor producers of the capsid proteins. The complete coding regions ^(^ch 
have been completely checked by re-sequendng) have therefore been redoned 
into the more satisfactory vectors. Results using these constructs suggest that 
the anuno-terminus of the capsid protein presents inherent difficulties in 
expression. These difiiculties may be imposed by either the nudeotide 

20 sequence encoding the amino terminus, or the actual amino add sequence 
itself. To discriminate between these possibilities, two types of mutants were 
made in the sequence encoding the amino terminal S residues of the HaSV 
capsid protein. These amino-terminal mutants are as follows: 



25 HVP71GLY 

CCCATATG GGC GAT GCC GGC GTC GCG TCA CAG 
Met Gly Asp Ala Gly Val Ala Ser Ghi 

HVP71SER: 

30 CCCATATG AGC GAG GCC GGC GTC GCG TCA CAG 
Met Ser Glu Ala Gly Val Ala Ser Ghi 
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Native HaSV seq: 

ATG GAG GAT GCT GGA GTG GCX3 TCA CAG 
Met Gly Asp Ala Gly Val Ala Ser Gin 

5 



EXAMPLE 4 

10 EXPRESSION IN BACULOVIRUS VECTORS AND BIOASSAY ON lARVAE 

Materials and Methods 

A(i) doniiig of HaSV capsid proton g^e. 

The capsid protein gene was amplified by PGR using the following primers: 
15 primers; 
HV17V71: 

5' GGGGGATCCCGCGGATTTATQAGCGAG 
HV17E71: 

5' GGGGGATCCCGCGGAGACAIGAGCGAGCACAC 
20 HVP71: 

5' GGGGGATCCAGCGACAIGAGAGATGCTGGAGTGG 
HVV71: 

5' GGGGGATCXIAGCGACAIQAGAGATGCTGGAGTGG 
The ATG trqdets initiating P17 (in HV17V71 and HV17E71) or P71 (in 
25 HVP71 and HW71) are underlined) 

V pri'"^'"^- 

Primers HVP65C and HVPGC2, described in Example 3. Results section 
Xiiia, were used. These constructs were made using one of the four 5' 
30 primers and HVP6C2. Plasmids constructed from PGR products made xising 
one of the four 5'- primers and HVP65C are called 17V64 (made using 5' 
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primer 17E71), P64 (made using 5' primer P71) and V64 (made using 5' 
primer V71). TTiese plasmids allow ejqsression of P64. 
A(ii) aoningafuUlengtfacDNAofHaSVRNAL 

For e>q)ression of an RNA transcript corresponding to full length HaSV RNA 
5 1, in insect cells by baculovirus infecdon or plasmid transfecdon, PGR was 
used to generate a fragment of cDNA linking the 5' end of RNA 1 to a Bam 
HI site. 

The primers were: 
HVR1B5' 

10 5' GGGGGATCCGTTCTGCXnXXCCGGAC 

(iit^ere the underlined nucleotide represents the start of natural RNA 1), and 
an oligonucleotide complementary to nudeoddes 1192=1212 of RNA 1. 
The template was plasmid pSRl(E3)B described in Example 3 above. 

15 A segment of the 1240 bp PGR fragment corresponding to the 5' 320 

nucleotides of RNA 1 was eainsed with Bam HI and ASC n and doned into 
the Bam HI site of pBSSK(-)[Stratagene) together with the 5 Kbp ASCH - Bam 
HI fragment of pSRl(E3)B, giving plasmid pBHVRlB, vrfiich carries the 
complete dDNA to HaSV RNA 1, flanked by Bam HI sites. 

20 

A(iii) aomiigafuniengtha>NAofHaSVRNA2. 
For egression of an RNA transcript corresponding to full length RNA 2 in 
insect cells by baculovirus infection or plasmid transfection, plasmid 
pB+NR2B was made by inserting a fragment carrying Hind III and Bam HI 
25 sites from the multiple doning site of vector pBSSK(-) [Stratagene] into 
plasmid pSR2B described above. The resulting plasmid, called pBHVR2B, 
carried the cDNA corresponding to fuD length HaSV RNA 2, flanked by Bam 
HI sites. 

A(iv) Baculovirus transfer plasmids, 
30 Bam HI fragments of 5.3 and 2.5 kbp corresponding to HaSV RNA's 1 and 2 
respectively, were excised from pBHVRlB and pBHVR2B respectively and 
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inserted into the baculovirus transfer vectors described below, ^^ch had been 
linearised with Bam HL 

B. Bacnloviriis Eiqiression of Protons. 
5 Baculovirus transfer vectors and engineered AcMNPV virus were transfected 
into Spodoptem fniffperda (SF9) cells as described by the siqjplier (Qontech) 
and as described in the following r^erences: 

Vlak, JJ^ & Kens, RJ A. (1990) in 'Viral Vaccines", WUey-Liss Inc., NY, 
pp.92-128; Kitts, Pj\. et al (1990) Nudeic Acids Research IS; 5667-5672; Kitts, 
10 Pj\. and Possee, RJ'. (in preparation); Possee, RD. (1986) Virus Research, 2; 
43-59. 

C Western Blotdng 
As in Example 1 

15 D; Oligoniicleotide& 

Hie following Ribozyme Oligonucleotides were produced according to 

standard methods. 

HVRlQa 

5' CCATCGATGCCGGACTGGTATCCCAGGGGG 

20 

5'HVR2aa 

5' CCATCGATGCCGGACTGGTATCCCXjAGGGAC 
RZHDVl 

25 5' CCATCGATGATCCAGCCTCCTCGCGGCGCCGGATGGGCA 
RZHDV2 

5' GCTCTAGATCCATrCGCCATCCGAAGATGCCCATCCGGC 
30 RZHCl 

5' CCATCGATTTATGCCGAGAAGGTAACCAGAGAAACACAC 
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RZHC2 

5' GCTCTAGACCAGGTAATATACCACAACGTGTGTT^ 
Results 

5 A series of recombinant baculoviruses has been constructed, based on the 
pVL941 transfer vector (PharMingen) or pBakPakS (Qontech) and the 
AcMNPV. Hiese are designed to express the correct forms of the precursor 
and processed HaSV cs^id proteins (P64 and P71) as well as the smaller 
capsid protein P6, and P17. In all systems ^ere replicatable RNA encoding 

10 the niideotide sequences of the present invention are to be used, such as 
eukaryotic systems, in order to get efficient replication, translation or 
encapsidation of the RNA it is necessary to exdsG structures downstream of 
the t-RNA like structure sudi as the 3' extension or poly A tail on the RNA. 
In ordcx to carry out such an exdsion, riboiymes or other suitable mechanisms 

15 may be employed Ihis self cleavage activity of the ribozyme containing 
transcript shoiild proceed at such a rate that most of the transcript is 
transported into the cytoplasm of the cell before the regeneration of a 
replicatable 3' end occurs. Such ribozyme systems are more fully e^qplained in 
Example 7. In the results presented here hig^y efficient production of P64 

20 and P71 has been achieved Electron microscopy and density gradient analysis 
have confirmed that empty particles ("capsoids") are being produced in 
infected cells that effidentiy e^qpress the P71 precursor gene. PI 7 placed in the 
context of the H. vineycens juvenile hormone esterase (JHE) gene (Hanzlik 
T.N., et al, J. Biol. Chem. 264, 12419-25 (1989)) is produced, but not in large 

25 amoimts. The latter construct results in a reduction of e3q>ression of the capsid 
protein from the same recombinant, presiunably due to a reduction in the 
nmnber of ribosomes reaching the AUG for the capsid gene. 

SF9 cells infected with recombinant baculovirus have been shown to contain 
30 large amounts of icosahedral virus particles by electron microscopy (data not 
shown). These partides contained no RNA, and were empty inside. This 
observation shows that signals on the viral RNA required for encapsidation of 
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RNA must be located in either the 5' 270 nucleotides or the 3' 170 
nucleotides, r both, since these sequences were missing firom the RNA 
transacts made using recombinant baculovirus. E3q>ression of HaSV proteins 
was confirmed by Western blotting of total protein extracts from infected 
S insect cells. 

In addition, the pAcUWSl vector (Qontech), ^ch carries two promoters, is 
being used to simultaneously express p6 and p64 as separate proteins. 
In order to bioassay the C£q)sid protein produced in baculovirus infected cells, 
10 it is first necessary to purify it fi-om the baculovirus expression vector. 
Preliminaiy attempts have made use of density gradients, based on the 
observation that empty virus particles ("assembled capsids") are in fact 
produced in infected cells. 

15 As outlined earlier, the HaSV genome or portion thereof is a particularly 

effective insectiddal agent for insertion into baculovirus vectors. Such a vector 
is constructed by ins^tion of the complete virus genome or portion thereof 
(preferably the replicase gene) into the baculovirus genome as shown in Fig. 
13. Preferably the virus genome or replicase is transcribed from a promoter 

20 active constitutively in insect cells or active at early sta^ upon baculovirus 
infection. An example of such a promoter is the heat shock promoter 
described in Example 7. Heat shock promoters are also activated in stressed 
cells, for example cells stressed by baculovirus infection. An even more 
preferable use of such a baculovirus construct is to use the HSP promoter to 

25 drive the HaSV replicase and another gene for a toxin (as exemplified 

else^ere in the ^)ecification) v/hexc the RNA eaqwressing the toxin gene is 
capable of being replicated by the HaSV replicase. Such recombinant 
baculoviruses carrying the HaSV genome or portions thereof for expression in 
larvae at early or other stages of the baculovirus infection c>^e are particularly 

30 effective biological insecticides. 
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EXAMPLE5 

EFFECT OF HaSV GENES AND THEIR PRODUCTS ON PLANTS 
Materials and Methods 
A. Electroporatioii of protoplasts. 
5 Protoplasts of Nicotiana tobacum, N. jjiumbagm^oiia and Triticum aesticum and 
oats were produced and electroporated with either HaSV or HaSV RNA as 
described in Matsunaga et al (1992) J.Gen. Virol. 22; 763-766. 

10 B. Northern blot analysis * RNA esdiaction from protoplasts aficer harvest 
The protoplasts are subjected to 3 cydes of freezing and thawing, and then an 
equal volume of 2x extraction buffer (100 mM TYis-HCI, pH 7.5, 25 mM 
EDTA, 1% SDS, made in DEPC treated water) is added, followed by 1 
volume of phenol (equilibrated in 10 mM Tris-HQ pH 8.0) heated to 65 *C 

15 The samples are mixed by vortexing and incubated at 65 •C for 15 min, 
vortexing every 5 mia After phase separation by centrifugation at room 
temperature for 5 min, the aqueous phase is re-«tracted with phenol, re 
separated by centrifugation and re-extracted with chloroform/isoam^ alcohol. 
To the aqueous phase are then added 0 J volume of DEPC-treated sodium 

20 acetate (pH 5.0) and 2 volumes of ethanol. The RNA is recovered by 
precipitation at -70 *C, followed by centrifugation at 4 for 15 min. The 
samples were then analysed by agarose gel electrophoresis as described in 
example 1. 

25 After blotting to Zeta-Probe membrane (BioRad), the hybridization protocols 
were as above for Example 2. 

C Total pTotdn frx)m HaSV - electroporated protoplasts. 
Protoplasts were analysed by SDS-polyacryiamide gel electrophoresis and 
30 Western blotting as described in Example 1. 
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Results 

i) Use of comply (leplication-oomp^rat) RNA vinis gmcfme in 
protoplasts 

a) HaSV replication in protoplasts 

5 Tlie nodavirus FHV has previously been shown to replicate in barley 

protoplasts (Selling Allison, R. F. and Kaesberg, P. Proc. Natl. Acad. Sci. 
USA 87,434-8 (1990). To determine Aether HaSV virus RNA can replicate 
in plants protoplasts, ^en introduced by electroporation, experiments using 
protoplasts from Nicotiana fiurhbagm^iM and ^eat have been conducted. 

10 (These are all species for vMdx protoplasts are regularly available in the 
Division of Plant industry). Assays for replication including RNA (Northern) 
blots using probes derived from cloned fragments of cDNA to RNAs 1 and 2, 
and Western blots, using the antiserum to purified HaSV particles. Initial 
ecq^eriments showed that both HaSV virus and RNA electroporated into 

15 protoplasts of M pbunboffn^olia resulted in HaSV replication as studied iising 
and verified by northern blots and ELJSA. As a positive control TMV RNA 
was electroporated and was replication observed. 

b) Bioassays 

20 Protoplasts into ^cfa HaSV RNA had been introduced by electroporation 
were harvested after 6 or 7 days post electroporation and used in bioassays on 
neonate larvae by addition to normal dieL The results showed significant 
stunting of test larvae in comparison to control larvae (see Table 1 below). 
Protoplasts lacking HaSV RNAs had no effect on the larvae, confirming the 

25 result of control e2q)eriments. This result confirms that HaSV RNA, ^en 
expressed or replicated in plant cells, is able to cause the formation of 
infectious virus particles able to control insect larvae feeding on the plant 
material. 

30 Northern blotting has been used to confirm that RNA electroporation into 
protoplasts leads to RNA replication. 
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Table 1: Results cf Bioassay from a typical ejqperiment-mih Nicotiana and oat 
protoplasts (oat results are shown in brackets) [see over] 





Treatment 


Number 


Escapes 


Nmnber stunted 


1. 


diet only 


12 (12) 


2(3) 


0/10 (0/9) 


2. 


diet + protoplasts 


12 (12) 


0(1) 


0/12 (0/11) 


3. 


HaSV+diet 


12 (12) 


0(1) 


12/12 (11/11) 


4. 


diet +HaSV/protoplasts 


12 (n.d.) 


0(n.d.) 


12/12 (n.d.) 


5. 


diet + RNA/protoplasts 


12 (12) 


0(0) 


11/12 (10Vl2) 



10 * HaSv replication in the larvae was confirmed except for two larvae 

^irfiich were dead. The letters "n.d." mean the e^riment was not done. 

The above results demonstrate assembly of HaSV particles from 
electroporated RNA in protoplasts of both moncot and dicot plant species. 

15 

c) Plasmids to test replication of cloned and mgineered forms of HaSV 

(1) Plasmids allowing in vitro transcription of HaSV RNAs 1 and 2 for 
electroporation into protoplasts have already been described above. 

(2) Plasmids for transient esqiression of individual HaSV RNAs (1 or 2) in 
20 protoplasts. Full-length cDNAs for the two viral RNAs have been inserted 

into expression plasmids pDH51 (with the CaMV 35 S promoter. Pietrzak M., 
et al (9186) Nud. Adds Res. 14, 5857-68) for dicots and pActI.cas (with the 
rice actin promoter) for monocots (MdEhoy et al (1990) The Plant Cell 2; 163- 
171). As with the vectors for e)q>ression in insect cells, these egression 
25 plasmids are being modified to indude a ds-acting ribozyme for generation of 
authentic ends. The non-ribozyme plasmids gave no virus rephcation. 
ii) Expression of capsid protdn in plants 

In view of the present inventors' observation that empty partides ("assembled 
capsids") are being produced in baculovinis-infected cells that efSdently 
30 express the P71 precursor gene, expression of the coding region for the capsid 
protein in tobacco plants was investigated. The vector chosen for this purpose 
is based on pDH51 v^Wch carries the CaMV 35S promoter and polyadenylation 
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signal. If necessary for improved e3q>ression, this vector can be modified by 
the addition of a translation enhancer seqiience from e.g. TMV. Although 
certain groups have constructed transgenic plants expressing the cs^id 
proteins of plant vhiises, there has been only one recent report of assembly of 
5 empty capsids in such plants (Bertioli et al.,(1991) J. gen. WoL 72: 1801-9). 
Bertioli et al point out that the protein-protein interactions in most icosohedral 
plant RNA viruses may be too weak to allow assembly of such capsids. In 
addition to the present inventors' observation of empty HaSV capsids, it has 
been found these capsids are very toug^ showing great resilience to e.g. 
10 repeated cycles of freezing and thawing, so that it is expected to see assembly 
of empty HaSV c^ids ("assembled ay^sids") in transgenic plants. 

EX/IMPLE6 

IDENnFICATlONOF MfDGUT BINDING DOMAINS 
IS Materials & Methods 

A. Plasmid constmction 

Was as described in Examples 3 and 4. 

B. Western blotting 

20 Was as described in Examples 1 and 3. 

C Invitro translation 

In vitro transcripts of cloned CDNA of HaSV RNA's was translated in vitro as 
in Examples 1 and 3. 

25 

D. Prqparation of Brush Border Membrane Vesides. 
Brush Border Membrane Vesicles were prepared from freshly isolated larvae 
midguts of HArmigerahy the method of M.Wolfersberger et al (1987) Comp. 
Biochem. Physiol. SfiA: 301-308, as modified by S.F.Garczyuski etal. (1991) 
30 Applied Environ. Miaxrbiol 52; 1816-2820. Brush Border Membrane Vesicles 
binding assays using invitro labelled protein or Habelled protein were as 
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described in Garczynski etal, (1991) or in H.M.Horton and Burand, J.P. (1993) 
J.Virol. eZ; 1860-1868. 

Results 

5 i) Detonnination of epitopes on the capsid surface 

Comparison of the recently published sequence of the Nudaurdia o virus 
( Afc)V) capsid protein with that of HaSV shown that these proteins are closely 
related and fall into four distinct domains, Drfiich are alternatively variable and 
highly conserved. These domains are simmiarised as follows: 



10 



Residues: HaSV 1-49 50-272 273-435 437-647 
N<SV: 1-46 47-269 270-430 431-645 
% identity: 37 81 34 81 



15 Comparison of this observation with the alignment by Agrawal and Johnson 
(1992) between the MdV and the nodavirus BBV (^niiose crystal structure is 
known: Hosur et al (1987) Proteins: Structure, Function & Genetics 2; 167- 
176) showed that the variable region coincided with a region forming the most 
prominent siuf ace protrusion on the BBV €:apsid. Both HaSV and Afc>V carry 

20 large insertions at this point relative to BBV, and these insertions are largely 
different in sequence. Assmning that the alignment by Agrawal and Johnson 
(1992) is correct, then this means that HaSV and N<SV have a more prominent 
pyrandd-like structures as a surface protrusion than do the nodaviruses, and 
the pyramid-like structures are different. As already noted, there is no 

25 immunological cross-reactivity between the two viruses, despite the high degree 
of identity. There is thxis a strong implication of the variable domain as a 
surface protrusion M^ch functions as the sole antigenic region. 

To confirm this a 4QQ bp Narl fragment spanning the variable region was 
30 deleted from the capsid gene in the ejipression vector. With end-filling of 

these sites the deletion is in-frame, so that a truncated protein of ca. 57 KDa is 
produced in bacteria upon induction. This protein was recognized only poorly 
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on Western blots by the antiserum against intact HaSV particles made in 
rabbits. The central variable domain was recognized well by the antiserum 
\^en esqiressed in isolation from the rest of the capsid gene. 

5 As shown in the table above the region of HaSV capsid protein comprising 
residues 273-439 shows great divergence form the corresponding region of the 
NoV capsid protein, compared to its inunediate flanking regions, ^thin this 
region an espedaUy divergent domain is found from residue 351 to residue 
411, vMch shows only 25% identity to the corresponding region of the NcoV 

10 capsid proteiiL This region is flanked by the sequences corresponding to the 
p-sheet structural features p-E(residues 339-349) and p-F(residues 424-431) of 
the HaSV capsid protein, based on the alignment the NoV and nodavirus 
capsid proteins by Agrawal and Johnson (1992), and is therefore likely to form 
the loop of the most prominent surface protrusion on the HaSV capsid. This 

15 is based on comparison and corres(x>ndence to the nodavirus capsid protein 
structure and csqisid structure as described by Wery J.-P. and Johnson, J£. 
(1989) Analytical Chemistry 61, 1341A-1350A and Kaesberg, P., et al. (1990) J. 
MoL BioL 214, 423-435. This loop is thought to contain important epitopes. It 
is significant that this exterior loop on the nodavirus capsid protein is one of 

20 the most variable regions ^en capsid proteins sequences from a niunber of 
nodaviruses are compared (Kaesberg et aL 1990). 

Finally, the present inventors have observed a significant level of 
inununological cross-reaction on Western blots, between antisera against the 

25 CryIA(c) Bt toxin and HaSV capsid protein, Aether obtained from virus or 
expressed in bacteria. Initial data from the Narl deletion mutant described 
above suggest that this binding is not to the central variable domain, but to 
other regions of the capsid protein. The only other region of the proteins 
v^^ch shows extensive sequence variability, the amino terminus, cannot be 

30 responsible for the binding, since both authentic capsid protein and the protein 
with an altered amino terminiis e^ressed in bacteria are recognized by the 
anti Bt antisera. 
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ii) In-Vitro binding assays 

The full-length clones for in vitro translation yieldmg highly ^ or ^ labelled 
proteins were constructed by replacing the bacterial translation interaction 

5 signal in the T7 plasmids above by the more active eucaryotic context sequence 
from the JHE gene. The labelled capsid protein made by in vitro translation of 
the in vtoo transcripts may be tested for binding to brush border membrane 
vesicles (BBMVs). Conditions are optimised by testing different procedures. 
The deletion mutant lacking jq>praximately 125 amino adds in the central 

10 region, and containing the variable domain, as well as others derived from it 
are also tested 

iii) Fusion proteins comprising virus capsid midgut binding domains and 
other proteins 

15 The idea behind these tests is to fuse the binding domain from the HaSV 
capsid protein to either large proteins (preferably indigestible, causing protein 
to aggregate in or on the midgut cells) or toxin domains from other proteins 
with suitable properties but normally different binding sp)ecificities (e.g. Bt). In 
initial experiments, the gene for the complete capsid protein has been fused to 

20 the GUS gene, as has a deletion mutant containing essentially only the central 
portion of the capsid gene. Tlie resulting fusion proteins are being eaq^ressed 
in bacteria and tested for GUS activity, and makes them sensitive probes for 
binding experiments on midgut tissue. 

25 iv) Mapping binding sites using Bt/HaSV fusion proteins 

Analysis of deletion mutants of the CryIA(c) Bt toxin has identified domains 
Mtoch may be involved in determining the host-specificity of this Bt by acting 
as receptor-bmding sites (Schnepf et al (1990) J. Biol. Chem. 265: 20923-20930; 
Li et al (1991), Namre 353: 815-21. The present inventors have obtained a 

30 clone of this toxin gene. Deletion mutants corresponding to those identified by 
Schnepf et al are constructed. Segments of the HaSV capsid protem gene can 
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then be inserted into these mutants, the protein e:q>ressed in bacteria and their 
insecdddal function assayed. 

EXAMPLE? 

5 VIRAL GROWTH IN CELL CULTURE 

Materials & Methods 

A. CeDIines 

The following cultured insect cell lines were tested for infection by HaSV: 
DmsopMa mdanogaster, HeUccvapa armigem (ovarian derived), Hdiothis zea 

10 (ovarian derived), PluteUa x^iostella, Spodoptem fiigiperda (SF9). 

All lines were grown under standard conditions. Upon reaching confluence, 
the culture medium was removed and all mono-layers covered with 1.5 ml of 
cell culture medium into HaSV had been diluted; the average 

multiplicity of infection (M.O.L) was 10*. After adsorption at 26 •C for 2h, the 

15 inoculum was removed, the cells carefully washed twice witii phosphate 

buffered saline (pH 7.0) and incubation continued with 5 ml of 10%. Foetal 
calf serum in TC199 culture medium (Cyto Systems). 

B. Northem Blotting Analysis. 

20 Wus replication in all the above cell lines was confirmed by northem blotting 
analysis. Total RNA was extracted fi-om infected cells by the method of 
Chomczynski and Sacchi (1987). Anal. Biochem. IfiZ; 156-159. The cells were 
lysed in 1 ml of lysis solution (4M guanidinium thiocyanate, 25mM sodium 
citrate, pH 7, 03% sarcosyl, O.IM 2-mercaptoethanol). In order, 0.1 ml of 2M 

25 sodium acetate, pH 4, 1 ml of phenol (0.2M sodium acetate equiUbrated), and 
02 ml of chloroform-isoam>d alcohol mixture (49:1) were added with thorou^ 
mixing between reagents. This was then vortexed for 10 s and cooled on ice 
for 15 min. Tubes were centrifuged in an Eppendorf centrifuge at 14k for 15 
min at 4 for at least 15 min to allow RNA precipitation. RNA was pelleted 

30 by centrifugation at 14k for 15 min, washed with 0.6 ml of ice-cold 70% 

ethanol, pelleted nee again (lOK, 10 min), air dried at room temperature and 
resuspended in DEPC (Sigma) treated millipore water. RNA was subject to 
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denaturing agarose gel electrophoresis in the presence of formaldehyde 
according to Sambrook etal. (1989). The gel was Northern transferred to a 
zeta-probe membrane (Biorad) as described by Sambrook etal. (1989). The 
probe was prepared by random-priming the 3' sequences of the HaSV genome 
usmg DNA and cDNA dones pSHVRlSGB and pT7T2p71SR-l as per 
manufacturer's instructions (Boehringer-Mannheim). Hybridization was 
carried out as described for the standard DNA probe protocol contained 
widiin the literature for the zeta-probe membrane (Biorad). 



10 C Vectors 

Vectors as described below. 

Results 

It has been found that HaSV will repUcate in several continuous cell lines, of 
15 ^ch the best is die Spodoptem frugqperda line SF9. Tmie course assays by 
Northern blotting in SF9 cells have shown that RNA 1 repUcation is clearly 
detectable within a few hours of infection. RNA 2 is present only in very small 
amounts early in infection and accumulates mudi more sloisdy than RNA 1 
does. This observation is consistent with one made earlier in HaSV-infected 
20 larvae, \(*iere RNA2 replication was not observed until 3 days after infection. 

Some apparent replication was also observed in Drosophila cells (DL2), but 
with the difference that more RNA 2 repUcation was observed at the early 
time points compared to the lepidopteran cell Knes above. 

25 

Plasmids tiiat express the HaSV genome as RNA transcripts from full length 
cDNA clones have been constructed and tested. These clones, constructed by 
PCR and carefully checked, have restriction sited immediately adjacent to the 
ends of the sequence. Transcription is driven from a specially-re-engineered 
30 Drosophila HSP70 promoter. 
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i) Constructs for expression in insect cells 

Hie constructs are based on vectors carrying the Drosophila HSP or actio 
promoters and suitJible polyadenylation signals from Drosphila (Corces & 
PeUicer (1984) J. Biol. Chem. 259: 14812-14817) or SV40 (Angelichio et al 

5 (1991) Nucl. Acids. Res. !& 5037-5043). Since transcription from such 
plasmids generates viral RNAs carrying long 3' terminal extensions derived 
from sequences in the poladenjdation signal fragment, it is necessary to achieve 
cleavage of the transcript immediately after the 3*sequence of the viral RNA. 
These plasmids gave no virus replication, presumably because of the 3' 

10 terminal extension. The method of choice for obtaining authentic 3' termini is 
based on introduction of DMA sequences encoding a cis-acting ribozyme into 
the constructs. With suitable engineering, such a ribozyme will cleave 
immediately 3' to the viral sequences within the transcript Suitable ribozymes, 
based on the hepatitis delta virus (Been UJ>., Perrotta, A. T. & Rosenstein, 

15 SJ». Biochemistry 31, 11843-52 (1992) or the hairpin cassette ribozyme 
(Altschuler, M., Tritz R. & Hampel, A Gene 122, 85-90 (1992) have been 
designed. TTiis invokes synthesis of overiapping oligpnudeotides, which are 
then annealed and end-filled with the Klenow fragment of DNA polymerase, 
to o-eate short DNA fragments encoding the desired ribozyme. These 

20 fragments carry restriction sites at their termini allowing them to be ligated 
mto plasmids between the viral RNA cDNA (which has a 3' restriction site 
added by PGR) and the restriction fragment carrying die poladenjdation signal. 

EXAMPLES 

25 SHEDDING OF INFECTED CELLS 

Materials & Methods 

A Confocal Laser Scaiming Microscopy. (CLSM) 
CLSM enables the visualisation and analysis of three-dimensional cell and 
tissue structures at the macro and molecular levels. The Leica CLSM used in 
30 this example is based on an MC 68020/68881 VME bus (20MHz) witii 

standard 2Mbyte framestore and 4Mbyte RAM and OS9 operating system with 
programmes written in C code. It incorporates a Leica Diaplan research 
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microscope and using XlO/0.45, X25/0.75^40/130 and X63/130 Huotar 
objectives has a claimed ptical efficiency better than 90%. The confocal 
pinhole is software controlled over the range of 20 to 200 ym. Excitation at 
488 and 514 nm is provided by a 2 to 50 mW argon-ion laser. 
5 B. Imnnmocytochcmistiy (ICC). 

For whol^ mount ICC, tissues were dissected under saline and fixed in fresh 
4% formaldehyde in phosphate buffered saline (PBS) for at least 15 mins. 
After multiple washes in PBS they were permeablized either by 60 mins 
incubation in PBT (PLBS with 0.1% Triton X-100 plus 02% bovme serum 

10 albumin). After 30 mins blocking in PBT+N (5% normal goat serum) tissue 
was incubated in primary antibody diluted (1:40) in PBT+N for at least 2 hrs 
at room temperature then at 4 *C ovemighL After extensive washing in PBT 
and 30 mins blocking in PBT+N the FTTC conjugated secondary antibody 
diluted (1:60) in PBT+N was incubated for 2 hrs at room temperature plus 

15 overnight at 4**C. After multiple washes in PBT and PBS the tissue was 
cleared in 70% glycerol and mounted in 0.01 %w/v p-phenyienediamine 
(Sigma#P1519) dissolved in 70% glycerol. All processmg was at room 
temperature unless othermse stated. 



20 Results 

The inventors' current model for the effect of HaSV involves the detection by 
the insect midgut of infected cells, their identification as infected and their 
subsequent shedding in nxunbers sufficient to cause irreparable damage to the 
insect midgut. The evidence for this is based on the above and on the 

25 foUowing direct observation of the fate of infected cells in midgut tissue over 
1-3 days post infection. These results in repeat experiments were complicated 
by the discovery that another imrelated virus was present in the larval 
population being tested. Preliminary findings indicated that HaSV infection 
activates or facilitates pathogenesis of the unrelated virus and together these 

30 cause severe disruption of the larval gut cells. Thus these two agents appear 
to act synergistically in causing gut cell disruption. 
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Midguts from larvae infected with HaSV were treated with the antiserum to 
purified HaSV particles (above) and examined under the Laser confocal 
microscope (described above). This established that some midgut cells were 
suffidentiy infected with HaSV to give strong fluorescence signals. Such cells 
5 were moreover clearly separating from the surroimding tissue, a sign that they 
were in the process of being shed 

Similar observation have been made with other insect viruses (Fhpsen et al 
(1992) Society for Invertebrate Pathology Abstract #96) althoug^h m these 

10 cases the effect is too localised and weak to cause any anti-feeding effect 

apparentiy only the small RNA virus of the tetraviridae are localised to 
the gut and cause more-or-less severe anti-feeding effects in their hosts 
(Moore, NF. in Kurstak E. (Ed) (1991) Vmises of Invertebrates. Marcel 
Dekker, New York pp277-285) are capable of sudi an effect to an extent 

15 sufficient for pest control. 

Following on from the immune-fluorescence work, in situ hybridization can be 
carried out to detect RNA replication in infected cellsPurthermore, larvae 
mfected with a recombinant HaSV caressing a foreign gene at early stages 

20 (by insertion of that gene into RNA 1 in place of the N-terminal portion of the 
replicase gene) can be studied. A correlation between virus replication and cell 
rejection can be confirmed by histochemical analysis of the midgut cells of the 
infected larvae. Thus the cell-shedding phenomenon offers a direct and rapid 
assay for early events in HaSV-infected gut tissue. Extracts of baculo-vector 

25 infected insect cells carrying empty HaSV particles can be fed to larvae 
direcdy and the midgut examined by toluidine blue staining and inmiune- 
fluorescence at mtervals after infection. This will allow direct determination of 
whether the particles can bind the brush border membranes in intaa gut, and 
Aether such binding can induce the masshre disruption evident in normally 

30 infected larvae. Control oqjeriments using extracts from cells infected with the 
baculovector alone can be conducted to observe and distinguish effects due to 
the vector. The immime-fluorescence assay on midgut tissue allows analysis of 
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binding to midgut brushborder membranes. Once determined for wild-type 
capsid protein expressed from a baculo-vector, deletion or replacement 
mutants can be inserted into the baculovector^. Suitable cell extracts from 
these can be used to infect larvae. 



EXAMPLE9 
ENGINEERED VIRUS AND USES 

10 Materials & Methods 

(as indicated in earlier Examples) 

i) Engineered vims as a vector for other toxin genes 

This involves placing suitable genes under control of HaSV repUcation and 

encapsidation signals. Genes vwhich may be suitable include intracellular insect 

15 toxins such as ridn, neurotoxms, gelonin and diphtheria toxins. The toxin gene 
may be placed in the viral gene such that it is a silent (downstream) cistron on 
a polydstronic RNA, or in a minus strand orientation, requiring repUcation by 
the viral polymerase to be e^ressed. Standard tedmiques in molecular 
biology can be used to engineer these vectors. 

20 A discussion of two recombinant HaSV vectors which have been designed is 
given below: 
for RNA 1: 

The reporter gene (or one of the toxin genes mentioned above) is inserted in 
place of the amino-terminal portion of the putative replicase gene, such that 
25 the intiation codon used for the replicase (ie that at nucleotides 37-39 of the 
sequence) is now used to commence reporter gene translation. The fusion is 
achieved by the use of artificial Ncol restriction sites common to both 
sequences. 

30 case) is synthesised as the following sequence: 

ggggatccacaGTrCTGCCTCCCCCGGACGGTAAATATAGGGGAACCATG 

Gtctagagg, 
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using two overla5>ping oligonucleotides comprising the first 40 nudeotides and 
the complem nt of the last 40 nudeotides respectively. These primers are 
annealed and end-fiUed by Klenow. The resulting fragment is then cut with 
BamHI and Xbal (sites underlined) and doned with plasmid vector 
5 pBSnSKC-). 

The GUS gene carrymg a Ncol site at the ATG codon was obtained as a Ncol- 
SacI fragment fit)m plasmid pRAJ275 (Jefferson, RAJ Plant Mol. Biol. Rep 5, 
3387-405 (1987)). This Sad site is located just downstream from the coding 
sequence for the GUS gene. 

10 The 5* leader of RNA is excised as a BamHI-NcoI fragment from the above 
vector, and is ligated together with the NcoI-SacI fragment carrying the GUS 
gene mto plasmid pHSPRIRZ or pDHVRIRZ carrymg the full-length cDNA 
insert of RNA 1 (see above) has been cut with BamHI and SacL The 
resulting plasmid then carries a complete form of RNA 1 but with the amino- 

15 terminal portion of the replicase gene substituted by the GUS gene. It is 
desirable to produce a construct with approximately the same size as RNA 1 
for encapsidation purposes. 

Similar approaches are adopted for RNA 2, with the foreign, reporter or toxin 
20 gene fused to the uiitiation codon of either P17 or P71. In either case the 
context sequence of the introduced gene is modified to give the necessary 
expression level of that protein. The foreign gene is introduced into plasmids 
pHSPR2RZ orpDHVR2RZ. 

25 The above recombinants have been described specifically as insertions of a 
reporter gene (GUS). The toxin genes to be inserted are described on page 13 
of the specification. These preferably further reqmre a signal peptide 
sequence added at the amino-terminus of the protein. 

30 
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ii) Capsid technology 

Identification of enc^idation (and replicadon) signals on virus RNA allows 
design of RNAs which can be encapsidated in HaSV particles during assembly 
of virus in a suitable production system. The vurus capsids then carry the RNA 
5 of choice into the insects midgut cells vAierc the RNA can perform its 
intended function. Examples of RNAs which may be encapsidated in this 
manner include RNAs for specific toxins such as intracellular toxins, such as 
ridn, gelonin, diptheria toxins or neurotoxms. This strategy is based on the 
resistance of the virus particle to the harsh gut environment 



iii) Other uses of the capsid partide 

The capsid particles can be used as veaors for protein toxins. Kno^edge of 
icosahedral particle structure elucidated by the inventors suggests that the 
amino and especially the C-termini are present within the capsid interior. It is 
15 possible to replace or modify the amino add sequence corresponding to P7 
such that it encodes a suitable protein toxin ^ndiich is cleaved off the bulk of 
the capsid protein during capsid maturation. As with toodn-encoding mRNAs, 
the HaSV capsid delivers it to the midgut cell of the feedmg insect, where it 
exerts the desired toxic effect 

20 

iv) Use of HaSV in plants 

The use of HaSV in the production of insect-resistant transgenic plants 
are shown in Fig. 12. These inventions are based on the use of either the 
complete HaSV genome, or of the repUcase gene as a tool for the 
25 amplification of suitable amplifiable mRNAs (e.g. encoding toxin) or of the 
capsid protein as a means to deliver insecticidal agents. These strategies are 
now described in some detail. 

a) Use of the complete HaSV genome 
30 Fragments of cDNA corresponding to the full-length HaSV genome 

components RNAs 1 and 2 are placed in a suitable vector for plant 
transformation under the control of either a constitutive plant promoter (e.g. 
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the CaMV 35S promoter mentioned above) or an inducible pr moter or a 
tissue specific (e.g. leaf-specific) promoten The cDNAs are followed by a ds- 
deaving ribo^me and a suitable plant polyadenjiation signal. Transcription 
and traiislation of these genes in transgenic plant tissues and cells leads to 
5 assembly of fully infectious virus particles to infect and kill feeding larvae. 

A variation on this strategy is to remove from the cDNA for RNA2 the 
fragments encoding RNA encapsidation and/or replication signals. This 
results in eiflier the assembly in the plant cells of HaSV particles carrying only 
10 RNA 1, or of HaSV particles carrying RNA 1 and a form of RNA 2 vMch 
cannot be repUcated in the infected insect cell. 

A ftirther variation on this strategy is to modify the plant transgene 
encoding RNA 2 so that it is still replicatable and encapsidatable, but no 
15 longer express fimctional capsid protein. HaSV oqpsids made in such plant 
cells will then be capable of making both the replicase and P17 in infected 
insect cells, but not of assraibling progeny virus particles therein (such as 
shown in Fig. 12(d)). These measures confer inherent biological safety in the 
form of containment on the use of such transgenic plant material. 

20 

(b) Use of portions of HaSV genome to deliver toxins to insect cells 

This approach makes use of any of the systems described in (a) above. 
Plant cells contain an additional transgene encoding a suitable insect-specific, 
intracellular toxin (as described above). Such a toxin gene is eaqiressed by 
25 plant RNA polymerase in either a positive or a negative sense (the latter is 
preferred) and in such a form that the RNA can be encapsidated by HaSV 
capsid protein and/or replicated by the HaSV replicase in infected insect cells 
(see Figs. 12a and 12b) 

30 Transgenic plants would contain two different transgenes, making either 

unmodified capsid protein precursor or a modified form in which most of the 
carbo3qterminal protein P7 is replaced by a suitable insect-specific toxui or one 
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^ch is inactive as part of a fusion protein. (Gelonin or other ribosome- 
inactivating proteins, insect gut toxins or neurotoxins may be suitable here.) 
Expression from these two transgenes would be regulated so that only the 
required amounts of the modified and unmodified forms are made in the plant 

5 cell, and assembled in such proportions into the capsoids. One way to 

modulate the production of capsotixm fusion proteins is to make translation of 
the carbco^erminal toxin reading frame dependent on a translational 
frameshift or read-througii of a termination codon. With an appropriate low 
frequency of frame-shifting (eg 0.1 - 2%), it could even be sufficient to use a 

10 single transgene, if it were possible to synthesise the P7 portion and the toxin 
portion as overlapping genes. Upon assembly (\Kiiicfa we have demonstrated in 
insect cells using the baculovirus vectors) and maturation, the protein 
precursors are cleaved and release the mature P7 and the toxin, vMch remain 
within the capsoids. These proteins are not released until capsoid disassembly 

15 occurs in insect gut cells. Tie processed form of the toxin is then able to kill 
the pesL 

(c) HaSV particles devoid of nucleic acid carrying one or more suitable 
protein toxins and/or their mRNA 
20 A protein toxin (or toxins) is e3q)ressed as a fusion with the capsid 

protein. The fusion protein then assembles into capsid carrying the toxin(s). 
These capsids present in the plant tissue exert an antifeeding effect on insects 
attaching the plant 

25 EXAMPLE 10 

E?CPRESSION OF Hasv IN OTHER DELIVERY VECTORS 
Materials & Methods 
(as indicated in earlier Examples) 

Constmcts similar to those for plant e3q>ression are introduced into yeast or 
30 bacteria by standard techniques. Virus particles are assembled for either fully 
infectious virus or any of the modified or biologically contained forms 
described in Example 9. Microbes produced in suitable fermentation or 
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culture facilities and carrying such forms of the virus are then delivered to the 
crop by spraying. The microbial cell wall provides extra protection for the 
virus partides produced within the microbe. 

5 Well established techniques exist for culture and transformation of yeast 

(Ausubel, FM. et oL (eds) Current Protocols in Molecular Biology, J. WAey & 
Sons, NY, 1989). An example of a yeast e3q>ression vector is pBM272, vMch 
contains the URA3 selectable marker (Johnston, M. & Davies, RW. Mol. Cell. 
BIoL 4, 1440-8, (1984); Stone, D. & Craig, E. MoL Cell BioL 10, 1622-32 

10 (1990). Another example of an e3q)ression vector is pRJ28, carrying the Trpl 
and Leu2 selectable markers. 

Yeast has recentiy been shown to support replication of RNA replicons 
derived from a plant RNA virus, brome mosaic virus (Janda, M. & Ahlquist, P. 
15 Cell 72, 961-70 (1993). Since the BMV replicase is distantiy related to that of 
HaSV, and the two viruses are likely to replicate by similar strategies within 
cells, yeast cells probably contain all the cellular factors reqiiired for HaSV to 
generate infectious virus. 

20 For bacteria, suitable egression vectors have been described above. 
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SEQUENCE LISTING. 



(1) GENERAL INFORMATION: 

(i) APPLICANT: Commonwealth Scientific and Industrial Research 
Organisation and 
Pacific Seeds Pty. Ltd^ 
(ia) INVENTORS: P. D, CHRISTIAN, K. H. J. GORDON and T- N. HANZLIK 

(ii) TITLE OF INVENTION: INSECT VIRUSES AND THEIR USES IN 

PROTECTING PLANTS 

(iii) NUMBER OF SEQUENCES: 52 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: DAVIES COLLISON CAVE 

(B) STREET: I LITTLE COLLINS STREET 

(C) CITY: MELBOURNE 

(D) STATE: VICTORIA 

(E) COUNTRY: AUSTRALIA 

(F) ZIP: 3000 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentin Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 13 AUGUST 1993 

(C) CLASSIFICATION: 

(viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: JOHN M. SLATTERY 

(B) REGISTRATION NUMBER: NA 

(C) REFERENCE/DOCKET NUMBER: 1613611 

(ix) TELECOMMUNICATION INFORMATION: 
(A) TELEPHONE: (613) 254 2777 
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(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 



GGATCCACAG NNN 13 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:2: 



ATGGGCGATG CCGGCGTCGC GTTCACAG 28 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:3r 



ATGGAGGATG CTGGAGTGGC GTCACAG 27 



(2) INFORMATION FOR SEQ ID NO: A: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 



ATGAGCGAGG CCGGCGTCGC GTCACAG 
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(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLCK^Y: linear 

(ii) MOLECULE TYPE: DMA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:5: 

CCATCGATGC CGGACTGGTA TCCCAGGGGG 30 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 



CCATCGATGC CGGACTGGTA TCCCGAGGGA C 31 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 



CCATCGATGA TCCAGCCTCC TCGCGGCGCC GGATGGGCA 39 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:8: 
GCTCTAGATC CATTCGCCAT CCGAAGATGC CCATCCGGC 39 
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(2) INFORKATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 



CCATCGATTT ATGCCGAGAA GGTAACCAGA GAAACACAC 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 



GCTCTAGACC AGGTAATATA CCACAACGTG TGTTTCTCT 39 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: A5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 



GGGGGGAATT CATTTAGGTG ACACTATAGT TCTGCCTCCC CGGAC 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
GGGGGGATCC TGGTATCCCA GGGGGGC 
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(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 



CCGGAAGCTT GTTTTTCTTT CTTTACCA 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:14: 
GCGGGATCCG ATGGTATCCC GAGGGACGCT CAGCAGGTGG CATAGG 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 52 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:15: 



AAATAATTTT GTTACTTTAG AAGGAGATAT ACATATGAGC GAGCGAGCAC AC 52 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 55 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 



AAATAATTTT GTTTAACCTT AAGAAGGAGA TCTACATATG CTGGAGTGGC GTCAC 55 
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(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 



GGAGATCTAC ATATGGGAGA TGCTGGAGTG 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 



GTAGCGAACG TCGAGAA 



(2) INFORMATION FOR SEQ ID N0:19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 



GGGGGATCCT CAGTTGTCAG TGGCGGGGTA G 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECL^LE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 



GGGGATCCCT AATTGGCACG AGCGGCGC 
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(2) INFORMATION FOR SEQ ID N0:21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

AATTACATAT GGCGGCCGCC GTTTCTGCC 29 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:22: 
AATTACATAT GTTCGCGGCC GCCGTTTCT 29 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein - N terminal 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:23: 

Phe Ala Ala Ala Val Ser Ala Phe Ala Ala Asn Met Leu Ser Ser Val 
1 5 10 15 

Leu Lys Ser 



SUBSTITUTE SHEET 



wo 94/04660 



PCr/AU93/00411 



83 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein - internal 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:24: 

Pro Thr Leu Val Asp Gin Gly Phe Trp lie Gly Gly Gin Tyr Ala Leu 
1 5 10 15 

Thr Pro Thr Ser 
20 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein - internal 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 



Phe Ala Ala Ala Val Ser 
1 5 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 



GCGCCCCCUG GGAUACCAGG AUG 
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(2) INFORMATION FOR SEQ ID NO:27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 



TCAGCAGGTG GCATAGG 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 6, .32 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 



CCCAT ATG GGC GAT GCC GGC GTC GCG TCA CAG 
Met Gly Asp Ala Gly Val Ala Ser Gin 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein - N-terminal 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 



Met Gly Asp Ala Gly Val Ala Ser Gin 
1 5 
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(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 6.*32 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 



CCCAT ATG AGC GAG GCC GGC GTC GCG TCA CAG 32 
Met Ser GIu Ala Gly Val Ala Ser Gin 
1 5 



(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein - N-terminal 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 



Met Ser Glu Ala Gly Val Ala Ser Gin 
1 5 



(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1,.27 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 



ATG GAG GAT GCT GGA GTG GCG TCA CAG 27 
Met Glu Asp Ala Gly Val Ala Ser Gin 
1 5 
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(2) INFORMATION FOR SEQ ID N0:33: 

(i) SEOUENCE CHARACTERISTICS: 

(a) LENGTH: 9 amino acids 
(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein - N-terminal 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 



Met GIu Asp Ala Gly Val Ala Ser Gin 
1 5 



(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 



GGGGGATCCC GCGGATTTAT GAGCGAG 27 



(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 



GGGGGATCCC GCGGAGACAT GAGCGAGCAC AC 32 



(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 



GGGGGATCCA GCGACATGAG AGATGCTGGA GTGG 34 
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(2) INFORMATION FOR SEQ ID NO:37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: 



GGGGGATCCA GCGACATGAG AGATGCTGGA GTGG 34 



(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:38: 
GGGGGATCCG TTCTGCCTCC CCGGAC 26 

(2) INFORMATION FOR SEQ ID N0:39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5312 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 37 •.5145 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

GTTCTGCCTC CCCCGGACGG TAAATATAGG GGAACA ATG TAG GCG AAA GCG ACA 54 

Met Tyr Ala Lys Ala Thr 
1 5 

GAC GTG GCG CGT GTC TAG GCC GCG GCA GAT GTC GCC TAG GCG AAC GTA 102 
Asp Val Ala Arg Val Tyr Ala Ala Ala Asp Val Ala Tyr Ala Asn Val 
10 15 20 

CTG GAG GAG AGA GCA GTC AAG TTG GAC TTC GCC GCG CCA GTG AAG GCA 150 
Leu Gin Gin Arg Ala Val Lys Leu Asp Phe Ala Pro Pro Leu Lys Ala 
25 30 35 

CTA GAA ACC GTC CAC AGA CTG TAG TAT GCG CTG CGC TTC AAA GGG GGC 198 
Leu Glu Thr Leu His Arg Leu Tyr Tyr Pro Leu Arg Phe Lys Gly Gly 
40 45 50 
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ACT TTA CCC CCG ACA CAA CAC CCG ATC CTG GCC GGG CAC CAA CGT GTC 2A6 
Thr Leu Pro Pro Thr Gin His Pro He Leu Ala Gly His Gin Arg Val 
55 60 65 70 

GCA GAA GAG GTT CTG CAC AAT TTC GCC AGG GGA CGT AGC ACA GTG CTC 294 
Ala Glu Glu Val Leu His Asn Phe Ala Arg Gly Arg Ser Thr Val Leu 
75 80 85 

GAG ATA GGG CCG TCT CTG CAC AGC GCA CTT AAG CTA CAT GGG GCA CCG 342 
Glu He Gly Pro Ser Leu His Ser Ala Leu Lys Leu His Gly Ala Pro 
90 95 100 

AAC GCC CCC GTC GCA GAG TAT CAC GGG TGC ACC AAG TAG GGG ACC CGC 390 
Asn Ala Pro Val Ala Asp Tyr His Gly Cys Thr Lys Tyr Gly Thr Arg 
105 110 115 

GAG GGC TCG CGA CAC ATT ACG GCC TTA GAG TCT AGA TCC GTC GCC ACA 438 
Asp Gly Ser Arg His He Thr Ala Leu Glu Ser Arg Ser Val Ala Thr 
120 125 130 

GGC GGG CCC GAG TTC AAG GCC GAG GCC TCA CTG CTC GCC AAC GGC ATT 486 
Gly Arc Pro Glu Phe Lys Ala Asp Ala Ser Leu Leu Ala Asn Gly He 
135 140 145 150 

GCC TCC CGC ACC TTC TGC GTC GAG GGA GTC GGC TCT TGC GCG TTC AAA 534 
Ala Ser Arg Thr Phe Cys Val Asp Gly Val Gly Ser Cys Ala Phe Lys 
155 160 165 

TCG CGC GTT GGA ATT GCC AAT CAC TCC CTC TAT GAC GTG ACC CTA GAG 582 
Ser Arg Val Gly He Ala Asn His Ser Leu Tyr Asp Val Thr Leu Glu 
170 175 180 

GAG CTG GCC AAT GCG TTT GAG AAC CAC GGA CTT CAC ATG GTC CGC GCG 630 
Glu Leu Ala Asn Ala Phe Glu Asn His Gly Leu His Met Val Arg Ala 
185 190 195 

TTC ATG CAC ATG CCA GAA GAG CTG CTC TAC ATG GAC AAC GTG GTT AAT 678 
Phe Met His Met Pro Glu Glu Leu Leu Tyr Met Asp Asn Val Val Asn 
200 205 210 

GCC GAG CTC GGC TAC CGC TTC CAC GTT ATT GAA GAG CCT ATG GCT GTG 726 
Ala Glu Leu Gly Tyr Arg Phe His Val He Glu Glu Pro Met Ala Val 
215 220 225 230 

AAG GAC TGC GCA TTC GAG GGG GGG GAC CTC CGT CTC CAC TTC CCT GAG 774 
Lys Asp Cys Ala Phe Gin Gly Gly Asp Leu Arg Leu His Phe Pro Glu 
^ ^ ^ 235 240 245 

TTG GAC TTC ATC AAC GAG AGC CAA GAG CGG CGC ATC GAG AGG CTG GCC 822 
Leu Asp Phe He Asn Glu Ser Gin Glu Arg Arg He Glu Arg Leu Ala 
250 255 260 

GCC CGC GGC TCC TAC TCC AGA CGC GCC GTC ATT TTC TCC GGC GAC GAC 870 
Ala Arg Gly Ser Tyr Ser Arg Arg Ala Val He Phe Ser Gly Asp Asp 
265 270 275 

GAC TGG GGT GAT GCG TAC TTA CAC GAC TTC CAC ACA TGG CTC GCC TAC 918 
Asp Trp Gly Asp Ala Tyr Leu His Asp Phe His Thr Trp Leu Ala Tyr 
280 285 290 

CTA CTG GTG AGG AAC TAC CCC ACT CCG TTT GGT TTC TCA CTC CAT ATA 966 
Leu Leu Val Arg Asn Tyr Pro Thr Pro Phe Gly Phe Ser Leu His He 
295 300 305 310 



SUBSTITUTE SHEET 



wo 94/04660 



PCr/AU93/00411 



89 



GAA GTC CAG AGO CGC CAC GGC TCC AGC ATT GAG CTG CGC ATC ACT CGC 1014 
Glu Val Gin Arg Arg His Gly Ser Ser He Glu Leu Arg He Thr Arg 
315 320 325 

GCG CCA CCT GGA GAG CGC ATG CTG GCC GTC GTC CCA AGG ACG TCC CAA 1062 
Ala Pro Pro Gly Asp Arg Met Leu Ala Val Val Pro Arg Thr Ser Gin 
330 335 340 

GGC CTC TGC AGA ATC CCA AAC ATC TTT TAT TAG GCC GAG GCG TCG GGC 1110 
Gly Leu Cys Arg He Pro Asn He Phe Tyr Tyr Ala Asp Ala Ser Gly 
' 345 350 355 

ACT GAG CAT AAG ACC ATC CTT ACG TCA CAG CAC AAA GTC AAC ATG CTG 1158 
Thr Glu His Lys Thr He Leu Thr Ser Gin His Lys Val Asn Met Leu 
360 365 370 

CTC AAT TTT ATG CAA ACG CGT CCT GAG AAG GAA CTA GTC GAG ATG ACC 1206 
Leu Asn Phe Met Gin Thr Arg Pro Glu Lys Glu Leu Val Asp Met Thr 
375 380 385 390 

GTC TTG ATG TCG TTC GCG CGC GCT AGG CTG CGC GCG ATC GTG GTC GCC 1254 
Val Leu Met Ser Phe Ala Arg Ala Arg Leu Arg Ala He Val Val Ala 
395 400 405 

TCA GAA GTC ACC GAG AGC TCC TGG AAC ATG TCA GCG GCT GAC CTG GTC 1302 
Ser Glu Val Thr Glu Ser Ser Trp Asn He Ser Pro Ala Asp Leu Val 
410 415 420 

CGC ACT GTC GTG TCT CTT TAG GTC CTC CAC ATC ATC GAG CGC CGA AGG 1350 
Are Thr Val Val Ser Leu Tyr Val Leu His He He Glu Arg Arg Arg 
425 430 435 

GCT GCG GTC GCT GTC AAG ACC GCC AAG GAG GAC GTC TTT GGA GAG ACT 1398 
Ala Ala Val Ala Val Lys Thr Ala Lys Asp Asp Val Phe Gly Glu Thr 
4A0 445 450 

TCG TTC TGG GAG AGT CTC AAG CAC GTC TTG GGC TCC TGT TGC GGT CTG 1446 
Ser Phe Trp Glu Ser Leu Lys His Val Leu Gly Ser Cys Cys Gly Leu 
455 - 460 465 470 

CGC AAC CTC AAA GGC ACC GAC GTC GTC TTT ACT AAG CGC GTC GTC GAT 1494 
Are Asn Leu Lys Gly Thr Asp Val Val Phe Thr Lys Arg Val Val Asp 
^ 475 480 485 

AAG TAC CGA GTC CAC TCG CTC GGA GAC ATA ATC TGC GAC GTC CGC CTG 1542 
Lvs Tyr Are Val His Ser Leu Gly Asp He He Cys Asp Val Arg Leu 
' ' 490 495 500 

TCC CCT GAA CAG GTC GGC TTC CTG CCG TCC CGC GTA CCA CCT GCC CGC 1590 
Ser Pro Glu Gin Val Gly Phe Leu Pro Ser Arg Val Pro Pro Ala Arg 
505 510 515 

GTC TTT CAC GAC AGG GAA GAG CTT GAG GTC CTT CGC GAA GCT GGC TGC 1638 
Val Phe His Asp Arg Glu Glu Leu Glu Val Leu Arg Glu Ala Gly Cys 
520 525 530 

TAC AAC GAA CGT CCG GTA CCT TCC ACT CCT CCT GTG GAG GAG CGC CAA 1686 
Tyr Asn Glu Arg Pro Val Pro Ser Thr Pro Pro Val Glu Glu Pro Gin 
535 540 545 550 
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GGT TTC GAG GCC GAG TTG TGG CAC GCG ACC GCG GCG TCA CTC CCG GAG 1734 
Gly Phe Asp Ala Asp Leu Trp His Ala Thr Ala Ala Ser Leu Pro Glu 
555 560 565 

TAG CGC GCC ACC TTG GAG GGA GGT CTC AAC ACC GAC GTC AAG CAG CTC 1782 
Tyr Arc Ala Thr Leu Gin Ala Gly Leu Asn Thr Asp Val Lys Gin Leu 
^ ^ 570 575 580 

AAG ATC ACC CTC GAG AAG GCC CTC AAG ACC ATC GAC GGG CTC ACC CTC 1830 
Lys He Thr Leu Glu Asn Ala Leu Lys Thr He Asp Gly Leu Thr Leu 
585 590 595 

TCC CCA GTC AGA GGC CTC GAG ATG TAG GAG GGC CCG CCA GGG ACC GGG 1878 
Ser Pro Val Arg Gly Leu Glu Met Tyr Glu Gly Pro Pro Gly Ser Gly 
600 605 610 

AAG ACG GGC ACC CTC ATC GCC GCC CTT GAG GGC GGG GGC GGT AAA GGA 1926 
Lys Thr Gly Thr Leu He Ala Ala Leu Glu Ala Ala Gly Gly Lys Ala 
615 620 625 630 

CTT TAG GTG GCA GCC ACC AGA GAA CTG AGA GAG GGT ATG GAC CGG CGG 1974 
Leu Tyr Val Ala Pro Thr Arg Glu Leu Arg Glu Ala Met Asp Arg Arg 
635 640 645 

ATC AAA CCG CCG TCC GCC TCG GGT ACG CAA CAT GTG GCC CTT GCG ATT 2022 
He Lys Pro Pro Ser Ala Ser Ala Thr Gin His Val Ala Leu Ala He 
650 655 660 

CTC CGT GGT GCC ACC GCC GAG GGC GCC CCT TTC GGT ACC GTG GTT ATC 2070 
Leu Arg Arg Ala Thr Ala Glu Gly Ala Pro Phe Ala Thr Val Val He 
665 670 675 

GAC GAG TGC TTC ATG TTC CCG CTC GTG TAG GTC GCG ATC GTG CAC GCC 2118 
Asp Glu Cys Phe Met Phe Pro Leu Val Tyr Val Ala He Val His Ala 
680 685 690 

TTG TCC CCG AGC TCA GGA ATA GTC CTT GTA GGG GAC GTC CAC CAA ATC 2166 
Leu Ser Pro Ser Ser Arg He Val Leu Val Gly Asp Val His Gin He 
695 700 705 710 

GGG TTT ATA GAC TTC CAA GGC AGA AGC GCG AAC ATG CCG CTG GTT CGC 2214 
Gly Phe He Asp Phe Gin Gly Thr Ser Ala Asn Met Pro Leu Val Arg 
715 720 725 

GAC GTC GTT AAG CAG TGC CGT CGG CGC ACT TTC AAC CAA ACC AAG CGC 2262 
Asp Val Val Lys Gin Cys Arg Arg Arg Thr Phe Asn Gin Thr Lys Arg 
730 735 740 

TGT CCG GCC GAC GTC GTT GCC ACG ACG TTT TTC CAG AGC TTG TAG GCC 2310 
Cys Pro Ala Asp Val Val Ala Thr Thr Phe Phe Gin Ser Leu Tyr Pro 
745 750 755 

GGG TGC ACA ACC ACC TCA GGG TGC GTC GCA TCC ATC AGC CAC GTC GCC 2358 
Gly Cys Thr Thr Thr Ser Gly Cys Val Ala Ser He Ser His Val Ala 
760 765 770 

CCA GAC TAG CGC AAC AGC CAG GCG CAA ACG CTC TGC TTC ACG CAG GAG 2406 
Pro Asp Tyr Are Asn Ser Gin Ala Gin Thr Leu Cys Phe Thr Gin Glu 
775 780 785 790 
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GAA AAG TCG CGC CAC GGG GCT GAG GGC GCG ATG ACT GTG GAG GAA GCG 
Glu Lys Ser Are His Gly Ala Glu Gly Ala Met Thr Val His Glu Ala 
' 795 800 805 

CAG GGA CGC ACT TTT GCG TCT GTC ATT CTG CAT TAG AAG GGC TCC ACA 
Gin Gly Are Thr Phe Ala Ser Val He Leu His Tyr Asn Gly Ser Thr 
810 815 820 

GCA GAG CAG AAG CTG CTC GCT GAG AAG TCG CAC CTT CTA GTC GGC ATG 
Ala Glu Gin Lys Leu Leu Ala Glu Lys Ser His Leu Leu Val Gly He 
825 830 835 

ACG CGC CAC ACC AAC CAC CTG TAG ATG CGC GAG CCG ACA GGT GAC ATT 
Thr Are His Thr Asn His Leu Tyr He Arg Asp Pro Thr Gly Asp He 
8A0 845 850 



GAG AGA GAA CTC AAC CAT AGC GCG AAA GCG GAG GTG TTT ACA GAC ATG 
Glu Are Gin Leu Asn His Ser Ala Lys Ala Glu Val Phe Thr Asp He 
855 860 865 870 



GCT GCA CCC CTG GAG ATC ACG ACT GTC AAA CCG ACT GAA GAG GTG CAG 
Pro Ala Pro Leu Glu He Thr Thr Val Lys Pro Ser Glu Glu Val Gin 
875 880 885 

CGC AAC GAA GTG ATG GCA ACG ATA CCC CCG CAG ACT GCG ACG CCG CAC 
Are Asn Glu Val Met Ala Thr He Pro Pro Gin Ser Ala Thr Pro His 
890 895 900 

GGA GCA ATC CAT CTG CTC CGC AAG AAC TTC GGG GAC CAA CCC GAC TGT 
Gly Ala lie His Leu Leu Arg Lys Asn Phe Gly Asp Gin Pro Asp Cys 
905 910 915 

GGC TGT GTC GCT TTG GCG AAG ACC GGC TAG GAG GTG TTT GGC GGT GGT 
Gly Cys Val Ala Leu Ala Lys Thr Gly Tyr Glu Val Phe Gly Gly Arg 
920 925 930 

CCC AAA ATC AAC GTA GAG CTT GCG GAA CCC GAC GCG ACC CCG AAG CCG 
Ala Lys He Asn Val Glu Leu Ala Glu Pro Asp Ala Thr Pro Lys Pro 
935 940 945 950 

CAT AGG GCG TTC CAG GAA GGG GTA CAG TGG GTC AAG GTC ACC AAC GCG 
His Arg Ala Phe Gin Glu Gly Val Gin Trp Val Lys Val Thr Asn Ala 
955 960 965 

TCT AAC AAA CAC CAG GCG CTC CAG ACG CTG TTG TCC CGC TAG ACC AAG 
Ser Asn Lys His Gin Ala Leu Gin Thr Leu Leu Ser Arg Tyr Thr Lys 
970 975 980 

GGA AGC GCT GAC CTG CCG CTA CAC GAA GCT AAG GAG GAC GTC AAA CGC 
Arg Ser Ala Asp Leu Pro Leu His Glu Ala Lys Glu Asp Val Lys Arg 
985 990 995 

ATG CTA AAC TCG CTT GAC GGA CAT TGG GAC TGG ACT GTC ACT GAA GAC 
Met Leu Asn Ser Leu Asp Arg His Trp Asp Trp Thr Val Thr Glu Asp 
1000 1005 1010 

GCC CGT GAC GGA GCT GTC TTC GAG ACC CAG CTC AAG TTC ACC CAA CGC 
Ala Are Asp Are Ala Val Phe Glu Thr Gin Leu Lys Phe Thr Gin Arg 
1015 ^ *^ ^ 1020 1025 1030 



2454 



2502 



2550 



2598 



2646 



2694 



2742 



2790 



2838 



2886 



2934 



2982 



3030 
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GGC GGC ACC GTC GAA GAC CTG CTG GAG CCA GAG GAG CCC TAG ATC CGT 3174 
Gly Gly Thr Val Glu Asp Leu Leu Glu Pro Asp Asp Pro Tyr He Arg 
^ 1035 1040 1045 

GAC ATA GAC TTC CTT ATG AAG ACT GAG CAG AAA GTG TCG CCC AAG CCG 3222 
Asp He Asp Phe Leu Met Lys Thr Gin Gin Lys Val Ser Pro Lys Pro 
1050 1055 1060 

ATC AAT ACG GGC AAG GTC GGG CAG GGG ATC GGC GCT GAG TCA AAG TCT 3270 
He Asn Thr Gly Lys Val Gly Gin Gly He Ala Ala His Ser Lys Ser 
1065 1070 1075 

CTG AAG TTC GTC GTC GGC GCT TGG ATA GGC ATA CTG GAG GAG ATA CTG 3318 
Leu Asn Phe Val Leu Ala Ala Trp He Arg He Leu Glu Glu He Leu 
1080 1085 1090 

CGT ACC GGG AGC GGC ACG GTC CGG TAG AGC AAG GGT CTG CCC GAC GAA 3366 
Arg Thr Gly Ser Arg Thr Val Arg Tyr Ser Asn Gly Leu Pro Asp Glu 
1095 1100 1105 1110 

GAA GAG CCC ATG CTG CTG GAA GGG AAG ATC AAT GAA GTC CCA CAC GCC 3414 
Glu Glu Ala Met Leu Leu Glu Ala Lys He Asn Gin Val Pro His Ala 
1115 1120 1125 

ACG TTC GTC TCG GGG GAC TGG ACC GAG TTT GAC ACC GCC CAC AAT AAG 3462 
Thr Phe Val Ser Ala Asp Trp Thr Glu Phe Asp Thr Ala His Asn Asn 
1130 1135 1140 

ACG ACT GAG CTG CTG TTC GCC GCC CTT TTA GAG CCC ATC GGC ACG CGT 3510 
Thr Ser Glu Leu Leu Phe Ala Ala Leu Leu Glu Arg He Gly Thr Pro 
1145 1150 1155 

CCA GCT GCC GTT AAT CTA TTC AGA GAA CGG TGT GGG AAA CGC ACC TTG 3558 
Ala Ala Ala Val Asn Leu Phe Arg Glu Arg Cys Gly Lys Arg Thr Leu 
1160 1165 1170 

CGA GCG AAG GGT CTA GGC TCG GTT GAA GTC GAC GGT CTG CTG GAC TCC 3606 
Arg Ala Lys Gly Leu Gly Ser Val Glu Val Asp Gly Leu Leu Asp Ser 
1175 - 1180 1185 1190 

GGC GCA GCT TGG ACG GCT TGG CGC AAC ACC ATC TTC TCT GCC GCC GTC 3654 
Gly Ala Ala Trp Thr Pro Cys Arg Asn Thr He Phe Ser Ala Ala Val 
^ 1195 1200 1205 

ATG CTC ACG CTG TTC CGC GGC GTC AAG TTC GCA GCT TTC AAA GGC GAC 3702 
Met Leu Thr Leu Phe Arg Gly Val Lys Phe Ala Ala Phe Lys Gly Asp 
1210 1215 1220 

GAC TCG CTC CTC TGT GGT AGC CAT TAG CTC CGT TTC GAC GCT AGC CGC 3750 
Asp Ser Leu Leu Cys Gly Ser His Tyr Leu Arg Phe Asp Ala Ser Arg 
1225 1230 1235 

CTT CAC ATG GGC GAA CGT TAG AAG ACC AAA CAT TTG AAG GTC GAG GTG 3798 
Leu His Met Gly Glu Arg Tyr Lys Thr Lys His Leu Lys Val Glu Val 
1240 1245 1250 

CAG AAA ATC GTG CCG TAC ATC GGA CTC CTC GTC TCC GCT GAG CAG GTC 3846 
Gin Lys He Val Pro Tyr He Gly Leu Leu Val Ser Ala Glu Gin Val 
1255 1260 1265 1270 
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GTC CTC GAG CCT GTC AGG AGC GCT CTC AAG ATA TTT GGG CGC TGC TAG 3894 
Val Leu Asp Pro Val Arg Ser Ala Leu Lys lie Phe Gly Arg Cys Tyr 
1275 1280 1285 

ACA AGG GAA GTC CTT TAG TGG AAG TAG GTG GAG GCT GTG AGA GAG ATC 3942 
Thr Ser Glu Leu Leu Tyr Ser Lys Tyr Val Glu Ala Val Arg Asp lie 
1290 1295 1300 

AGG AAG GGG TGG AGT GAG GCC CGC TAG GAG AGC CTC GTG TGC CAG ATG 3990 
Thr Lys Gly Trp Ser Asp Ala Arg Tyr His Ser Leu Leu Cys His Met 
1305 1310 1315 

TCA GCA TGC TAG TAG AAT TAG GCG GGG GAG TGT GGG GGG TAG ATG ATC 4038 
Ser Ala Cys Tyr Tyr Asn Tyr Ala Pro Glu Ser Ala Ala Tyr lie lie 
1320 1325 1330 

GAG GCT GTT GTT CGC TTT GGG CGC GGG GAG TTC GGG TTT GAA CAA CTG 4086 
Asp Ala Val Val Arg Phe Gly Arg Gly Asp Phe Pro Phe Glu Gin Leu 
1335 1340 1345 1350 

CGC GTG GTG GGT GCC CAT GTG CAG GCA GCC GAG GCT TAG AGC AGG AGG 4134 
Arg Val Val Arg Ala His Val Gin Ala Pro Asp Ala Tyr Ser Ser Thr 
1355 1360 1365 

TAT CCG GCT AAC GTG CGC GCA TGG TGC CTT GAG CAC GTC TTC GAG CGC 4182 
Tyr Pro Ala Asn Val Arg Ala Ser Cys Leu Asp His Val Phe Glu Pro 
1370 1375 1380 

CGC CAG GCC GCG GCC CCG GCA GGT TTC GTT GCG ACA TGT GCG AAG GGG 4230 
Arg Gin Ala Ala Ala Pro Ala Gly Phe Val Ala Thr Cys Ala Lys Pro 
1385 1390 1395 

GAA AGG CCT TGT TCA CTT AGC GCG AAA GCT GGT GTT TCT GCG ACT ACA 4278 
Glu Thr Pro Ser Ser Leu Thr Ala Lys Ala Gly Val Ser Ala Thr Thr 
1400 1405 1410 

AGC CAC GTT GCG ACT GGG ACT GCG CCG CCG GAG TCT CCA TGG GAT GCA 4326 
Ser His Val Ala Thr Gly Thr Ala Pro Pro Glu Ser Pro Trp Asp Ala 
1415 1420 1425 1430 

CCT GCA GCC AAC AGC TTT TGG GAG TTA TTG ACA CCG GAG AGC CCG TCC 4374 
Pro Ala Ala Asn Ser Phe Ser Glu Leu Leu Thr Pro Glu Thr Pro Ser 
1435 1440 1445 

AGA TCA TCC TGG CCG TCA TCG TCT TCA TGG GAG TCC TCT ACA TGG TGT 4422 
Thr Ser Ser Ser Pro Ser Ser Ser Ser Ser Asp Ser Ser Thr Ser Cys 
1450 1455 1460 

GGA AGG TCG GTG AGT GGT GGA GAG ACC GCA AGG ACC ACA GAA GAC TTG 4470 
Gly Arg Ser Leu Ser Gly Gly Asp Thr Ala Arg Thr Thr Glu Asp Leu 
1465 1470 1475 

AAC AGC AGA AAG CCG CCT TCG CAA GAC AGG CAA TCA CGC TCG TCT GAA 4518 
Asn Ser Arg Lys Pro Pro Ser Gin Asp Arg Gin Ser Arg Ser Ser Glu 
1480 1485 1490 

TGT CTG GAC AGA AGC GGA GAA AGG ACA GGG AGT TCG TTA ACT GCC CGC 4566 
Cys Leu Asp Arg Ser Gly Glu Arg Thr Gly Ser Ser Leu Thr Ala Pro 
1495 1500 1505 1510 
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ACT GCT CCG AGC CCC TCA TTC TCA TTT TCG GAA AGA GOT CGA CTG GCG A61A 
Thr Ala Pro Ser Pro Ser Phe Ser Phe Ser Glu Arg Ala Arg Leu Ala 
1515 1520 1525 

ACC GGG CCG ACT GTC GCC GCT GCG ACA TCA CCT TCG CCA ACC CCA TCC 4662 
Thr Gly Pro Thr Val Ala Ala Ala Thr Ser Pro Ser Ala Thr Pro Ser 
1530 1535 1540 

TGC GCC ACG GAC GAG GTT GCC GCG AGG ACC ACC CCG GAG TTT GCG CCT 4710 
Cys Ala Thr Asp Gin Val Ala Ala Arg Thr Thr Pro Asp Phe Ala Pro 
1545 1550 1555 

TTC CTG GGT TCC GAG TCT GCC CGT GCT GTC TCG AAG CCG TAG CGG CCC 4758 
Phe Leu Gly Ser Gin Ser Ala Arg Ala Val Ser Lys Pro Tyr Arg Pro 
1560 1565 1570 

CCC ACG ACT GCC CGT TGG AAA GAA GTC ACC CCG CTG CAC GCG TGG AAG 4806 
Pro Thr Thr Ala Arg Trp Lys Glu Val Thr Pro Leu His Ala Trp Lys 
1575 1580 1585 1590 

GGG GTG ACC GGA GAC CGA CCG GAA GTC AGG GAG GAC CCG GAG ACA GCG 4854 
Gly Val Thr Gly Asp Arg Pro Glu Val Arg Glu Asp Pro Glu Thr Ala 
1595 1600 1605 

GCG GTC GTC CAG GCT CTG ATC AGC GGG CGT TAT CCT GAG AAG ACG AAG 4902 
Ala Val Val Gin Ala Leu He Ser Gly Arg Tyr Pro Gin Lys Thr Lys 
1610 1615 1620 

GTT TCC TCC GAC GCA TCC AAA GGG TAG TCA AGA ACT AAG GGA TGC TCA 4950 
Leu Ser Ser Asp Ala Ser Lys Gly Tyr Ser Arg Thr Lys Gly Cys Ser 
1625 1630 1635 

CAA TCC ACC TCT TTT CCT GCC CCG ACT GCG GAT TAG CAG GCC CGC GAC 4998 
Gin Ser Thr Ser Phe Pro Ala Pro Ser Ala Asp Tyr Gin Ala Arg Asp 
1640 1645 1650 

TGC CAG ACA GTC CGA GTC TGC CGC GCC GCT GCA GAG ATG GCG CGC TCA 5046 
Cys Gin Thr Val Arg Val Cys Arg Ala Ala Ala Glu Met Ala Arg Ser 
1655 1660 1665 1670 

TGT ATT CAC GAG CCG TTG GCT TCA TCT GCC GCC ACT GCC GAC TTC AAG 5094 
Cys He His Glu Pro Leu Ala Ser Ser Ala Ala Ser Ala Asp Leu Lys 
1675 1680 1685 

CGC ATA CGC TCT ACC TCG GAC TCT GTT CCC GAT GTA AAG ATC AGC AAG 5142 
Arg He Arg Ser Thr Ser Asp Ser Val Pro Asp Val Lys He Ser Lys 
1690 1695 1700 

AGC GCA TGAAGGAACA AAATTAGTTT CCTTGTTCGT AAACAAGGTG GTCCCTCCCA 5198 
Ser Ala 

TTGAGGTAAA GACTCTGGTG AGTCCTCAAC GTTACTCGTT GAGTCTGCTG CGGTTCGATT 5258 

CCATTCCCAA GCAGCAAAGG GTGCGCAACT AGTACGGCGC CCCCTGGGAT ACCA 5312 
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(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1703 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:A0: 

Met Tyr Ala Lys Ala Thr Asp Val Ala Arg Val Tyr Ala Ala Ala Asp 
15 10 15 

Val Ala Tyr Ala Asn Val Leu Gin Gin Arg Ala Val Lys Leu Asp Phe 
20 25 30 

Ala Pro Pro Leu Lys Ala Leu Glu Thr Leu His Arg Leu Tyr Tyr Pro 
35 40 45 

Leu Arc Phe Lys Gly Gly Thr Leu Pro Pro Thr Gin His Pro lie Leu 
50 55 60 

Ala Gly His Gin Arg Val Ala Glu Glu Val Leu His Asn Phe Ala Arg 
65 70 75 80 

Gly Are Ser Thr Val Leu Glu lie Gly Pro Ser Leu His Ser Ala Leu 
^ ^ 85 90 95 

Lys Leu His Gly Ala Pro Asn Ala Pro Val Ala Asp Tyr His Gly Cys 
100 105 110 

Thr Lys Tyr Gly Thr Arg Asp Gly Ser Arg His He Thr Ala Leu Glu 
115 120 125 

Ser Arg Ser Val Ala Thr Gly Arg Pro Glu Phe Lys Ala Asp Ala Ser 
130 135 140 

Leu Leu Ala Asn Gly He Ala Ser Arg Thr Phe Cys Val Asp Gly Val 
145 150 155 160 

Gly Ser Cys Ala Phe Lys Ser Arg Val Gly He Ala Asn His Ser Leu 
165 170 175 

Tyr Asp Val Thr Leu Glu Glu Leu Ala Asn Ala Phe Glu Asn His Gly 
180 185 190 

Leu His Met Val Arg Ala Phe Met His Met Pro Glu Glu Leu Leu Tyr 
195 200 205 

Met Asp Asn Val Val Asn Ala Glu Leu Gly Tyr Arg Phe His Val He 
210 215 220 

Glu Glu Pro Met Ala Val Lys Asp Cys Ala Phe Gin Gly Gly Asp Leu 
225 230 235 240 

Are Leu His Phe Pro Glu Leu Asp Phe He Asn Glu Ser Gin Glu Arg 
^ 245 250 255 

Arc He Glu Arg Leu Ala Ala Arg Gly Ser Tyr Ser Arg Arg Ala Val 
260 265 270 



SUBSTITUTE SHEET 



wo 94/04660 



PCr/AU93/004n 



96. 



lie Phe Ser Gly Asp Asp Asp Trp Gly Asp Ala Tyr Leu His Asp Phe 
275 280 285 

His Thr Trp Leu Ala Tyr Leu Leu Val Arg Asn Tyr Pro Thr Pro Phe 
290 295 300 

Gly Phe Ser Leu His He Glu Val Gin Arg Arg His Gly Ser Ser He 
305 310 315 320 

Glu Leu Arg He Thr Arg Ala Pro Pro Gly Asp Arg Met Leu Ala Val 
325 330 335 

Val Pro Arg Thr Ser Gin Gly Leu Cys Arg He Pro Asn He Phe Tyr 
340 345 350 

Tyr Ala Asp Ala Ser Gly Thr Glu His Lys Thr He Leu Thr Ser Gin 
355 360 365 

His Lys Val Asn Met Leu Leu Asn Phe Met Gin Thr Arg Pro Glu Lys 
370 375 380 

Glu Leu Val Asp Met Thr Val Leu Met Ser Phe Ala Arg Ala Arg Leu 
385 390 395 400 

Arg Ala He Val Val Ala Ser Glu Val Thr Glu Ser Ser Trp Asn He 
405 410 415 

Ser Pro Ala Asp Leu Val Arg Thr Val Val Ser Leu Tyr Val Leu His 
420 425 430 

He He Glu Arg Arg Arg Ala Ala Val Ala Val Lys Thr Ala Lys Asp 
435 440 445 

Asp Val Phe Gly Glu Thr Ser Phe Trp Glu Ser Leu Lys His Val Leu 
450 455 460 

Gly Ser Cys Cys Gly Leu Arg Asn Leu Lys Gly Thr Asp Val Val Phe 
465 ^ 

Thr Lys Arg Val Val Asp Lys Tyr Arg Val His Ser Leu Gly Asp He 
485 490 495 

He Cys Asp Val Arg Leu Ser Pro Glu Gin Val Gly Phe Leu Pro Ser 
500 505 510 

Arg Val Pro Pro Ala Arg Val Phe His Asp Arg Glu Glu Leu Glu Val 
515 520 525 

Leu Arg Glu Ala Gly Cys Tyr Asn Glu Arg Pro Val Pro Ser Thr Pro 
530 535 540 

Pro Val Glu Glu Pro Gin Gly Phe Asp Ala Asp Leu Trp His Ala Thr 
545 550 555 560 

Ala Ala Ser Leu Pro Glu Tyr Arg Ala Thr Leu Gin Ala Gly Leu Asn 
565 570 575 

Thr Asp Val Lys Gin Leu Lys He Thr Leu Glu Asn Ala Leu Lys Thr 
580 585 590 
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He Asp Gly L u Thr Leu Ser Pro Val Arg Gly Leu Glu Met Tyr Glu 
595 600 605 

Gly Pro Pro Gly Ser Gly Lys Thr Gly Thr Leu He Ala Ala Leu Glu 
610 615 620 

Ala Ala Gly Gly Lys Ala Leu Tyr Val Ala Pro Thr Arg Glu Leu Arg 
625 ^ ' ^ 635 640 

Glu Ala Met Asp Arg Arg He Lys Pro Pro Ser Ala Ser Ala Thr Gin 
645 650 655 

His Val Ala Leu Ala He Leu Arg Arg Ala Thr Ala Glu Gly Ala Pro 
660 665 670 

Phe Ala Thr Val Val He Asp Glu Cys Phe Met Phe Pro Leu Val Tyr 
675 680 685 

Val Ala He Val His Ala Leu Ser Pro Ser Ser Arg He Val Leu Val 
690 695 700 

Gly Asp Val His Gin He Gly Phe He Asp Phe Gin Gly Thr Ser Ala 
705 710 715 720 

Asn Met Pro Leu Val Arg Asp Val Val Lys Gin Cys Arg Arg Arg Thr 
725 730 735 

Phe Asn Gin Thr Lys Arg Cys Pro Ala Asp Val Val Ala Thr Thr Phe 
740 745 750 

Phe Gin Ser Leu Tyr Pro Gly Cys Thr Thr Thr Ser Gly Cys Val Ala 
755 760 765 

Ser He Ser His Val Ala Pro Asp Tyr Arg Asn Ser Gin Ala Gin Thr 
770 775 780 

Leu Cys Phe Thr Gin Glu Glu Lys Ser Arg His Gly Ala Glu Gly Ala 
785 790 795 800 

Met Thr Val His Glu Ala Gin Gly Arg Thr Phe Ala Ser Val He Leu 
805 810 815 

His Tyr Asn Gly Ser Thr Ala Glu Gin Lys Leu Leu Ala Glu Lys Ser 
820 825 830 

His Leu Leu Val Gly He Thr Arg His Thr Asn His Leu Tyr He Arg 
835 840 845 

Asp Pro Thr Gly Asp He Glu Arg Gin Leu Asn His Ser Ala Lys Ala 
850 855 860 

Glu Val Phe Thr Asp He Pro Ala Pro Leu Glu He Thr Thr Val Lys 
865 870 875 880 

Pro Ser Glu Glu Val Gin Arg Asn Glu Val Met Ala Thr He Pro Pro 
885 890 895 

Gin Ser Ala Thr Pro His Gly Ala He His Leu Leu Arg Lys Asn Phe 
900 905 910 
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Gly Asp Gin Pro Asp Cys Gly Cys Val Ala Leu Ala Lys Thr Gly Tyr 
' 915 920 925 

Glu Val Phe Gly Gly Arg Ala Lys He Asn Val Glu Leu Ala Glu Pro 
930 935 940 

Asp Ala Thr Pro Lys Pro His Arg Ala Phe Gin Glu Gly Val Gin Trp 
945 950 955 960 

Val Lys Val Thr Asn Ala Ser Asn Lys His Gin Ala Leu Gin Thr Leu 
965 970 975 

Leu Ser Arg Tyr Thr Lys Arg Ser Ala Asp Leu Pro Leu His Glu Ala 
980 985 990 

Lys Glu Asp Val Lys Arg Met Leu Asn Ser Leu Asp Arg His Trp Asp 
^ 995 ^ * loQQ 1005 

Trp Thr Val Thr Glu Asp Ala Arg Asp Arg Ala Val Phe Glu Thr Gin 
1010 1015 1020 

Leu Lys Phe Thr Gin Arg Gly Gly Thr Val Glu Asp Leu Leu Glu Pro 
1025 1030 1035 lOAO 

Asp Asp Pro Tyr He Arg Asp He Asp Phe Leu Met Lys Thr Gin Gin 
1045 1050 1055 

Lys Val Ser Pro Lys Pro He Asn Thr Gly Lys Val Gly Gin Gly He 
1060 1065 1070 

Ala Ala His Ser Lys Ser Leu Asn Phe Val Leu Ala Ala Trp He Arg 
1075 1080 1085 

He Leu Glu Glu He Leu Arg Thr Gly Ser Arg Thr Val Arg Tyr Ser 
1090 1095 1100 

Asn Gly Leu Pro Asp Glu Glu Glu Ala Met Leu Leu Glu Ala Lys He 
1105 1110 1115 1120 

Asn Gin Val Pro His Ala Thr Phe Val Ser Ala Asp Trp Thr Glu Phe 
1125 1130 1135 

Asp Thr Ala His Asn Asn Thr Ser Glu Leu Leu Phe Ala Ala Leu Leu 
1140 1145 1150 

Glu Arg He Gly Thr Pro Ala Ala Ala Val Asn Leu Phe Arg Glu Arg 
1155 1160 1165 

Cys Gly Lys Arg Thr Leu Arg Ala Lys Gly Leu Gly Ser Val Glu Val 
1170 1175 1180 

Asp Gly Leu Leu Asp Ser Gly Ala Ala Trp Thr Pro Cys Arg Asn Thr 
1185 1190 1195 1200 

He Phe Ser Ala Ala Val Met Leu Thr Leu Phe Arg Gly Val Lys Phe 
1205 1210 1215 

Ala Ala Phe Lys Gly Asp Asp Ser Leu Leu Cys Gly Ser His Tyr Leu 
1220 1225 1230 
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Arg Phe Asp Ala Ser Arg Leu His Met Gly Glu Arg Tyr Lys Thr Lys 
1235 12A0 1245 

His Leu Lys Val Glu Val Gin Lys He Val Pro Tyr He Gly Leu Leu 
1250 1255 1260 

Val Ser Ala Glu Gin Val Val Leu Asp Pro Val Arg Ser Ala Leu Lys 
1265 1270 1275 1280 

He Phe Gly Arg Cys Tyr Thr Ser Glu Leu Leu Tyr Ser Lys Tyr Val 
1285 1290 1295 

Glu Ala Val Arg Asp He Thr Lys Gly Trp Ser Asp Ala Arg Tyr His 
1300 1305 1310 

Ser Leu Leu Cys His Met Ser Ala Cys Tyr Tyr Asn Tyr Ala Pro Glu 
1315 1320 1325 

Ser Ala Ala Tyr He He Asp Ala Val Val Arg Phe Gly Arg Gly Asp 
1330 1335 1340 

Phe Pro Phe Glu Gin Leu Arg Val Val Arg Ala His Val Gin Ala Pro 
1345 1350 1355 1360 

Asp Ala Tyr Ser Ser Thr Tyr Pro Ala Asn Val Arg Ala Ser Cys Leu 
1365 1370 1375 

Asp His Val Phe Glu Pro Arg Gin Ala Ala Ala Pro Ala Gly Phe Val 
1380 1385 1390 

Ala Thr Cys Ala Lys Pro Glu Thr Pro Ser Ser Leu Thr Ala Lys Ala 
1395 1400 1405 

Gly Val Ser Ala Thr Thr Ser His Val Ala Thr Gly Thr Ala Pro Pro 
1410 1415 1420 

Glu Ser Pro Trp Asp Ala Pro Ala Ala Asn Ser Phe Ser Glu Leu Leu 
1425 1430 1435 1440 

Thr Pro Glu Thr Pro Ser Thr Ser Ser Ser Pro Ser Ser Ser Ser Ser 
1445 1450 1455 

Asp Ser Ser Thr Ser Cys Gly Arg Ser Leu Ser Gly Gly Asp Thr Ala 
1460 1465 1470 

Arg Thr Thr Glu Asp Leu Asn Ser Arg Lys Pro Pro Ser Gin Asp Arg 
1475 1480 1485 

Gin Ser Arg Ser Ser Glu Cys Leu Asp Arg Ser Gly Glu Arg Thr Gly 
1490 1495 1500 

Ser Ser Leu Thr Ala Pro Thr Ala Pro Ser Pro Ser Phe Ser Phe Ser 
1505 1510 1515 1520 

Glu Arg Ala Arg Leu Ala Thr Gly Pro Thr Val Ala Ala Ala Thr Ser 
1525 1530 1535 

Pro Ser Ala Thr Pro Ser Cys Ala Thr Asp Gin Val Ala Ala Arg Thr 
1540 1545 1550 
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Thr Pro Asp Phe Ala Pro Phe Leu Gly Ser Gin Ser Ala Arg Ala Val 
1555 1560 1565 

Ser Lys Pro Tyx Arg Pro Pro Thr Thr Ala Arg Trp Lys Glu Val Thr 
1570 1575 1580 

Pro Leu His Ala Trp Lys Gly Val Thr Gly Asp Arg Pro Glu Val Arg 
1585 1590 1595 1600 

Glu Asp Pro Glu Thr Ala Ala Val Val Gin Ala Leu He Ser Gly Arg 
1605 1610 1615 

Tyr Pro Gin Lys Thr Lys Leu Ser Ser Asp Ala Ser Lys Gly Tyr Ser 
^ 1620 1625 1630 

Arg Thr Lys Gly Cys Ser Gin Ser Thr Ser Phe Pro Ala Pro Ser Ala 
1635 1640 1645 

Asp Tyr Gin Ala Arg Asp Cys Gin Thr Val Arg Val Cys Arg Ala Ala 
1650 1655 1660 

Ala Glu Met Ala Arg Ser Cys He His Glu Pro Leu Ala Ser Ser Ala 
1665 1670 1675 1680 

Ala Ser Ala Asp Leu Lys Arg He Arg Ser Thr Ser Asp Ser Val Pro 
1685 1690 1695 

Asp Val Lys He Ser Lys Ser Ala 
1700 



(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5312 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
- (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 4218. ,4512 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:41: 



GTTCTGCCTC CCCCGGACGG 


TAAATATAGG 


GGAACAATGT ACGCGAAAGC 


GACAGACGTG 


60 


GCGCGTGTCT ACGCCGCGGC 


AGATGTCGCC 


TACGCGAACG 


TACTGCAGCA 


GAGAGCAGTC 


120 


AAGTTGGACT TCGCCCCGCC 


ACTGAAGGCA 


CTAGAAACCC 


TCCACAGACT 


GTACTATCCG 


180 


CTGCGCTTCA AAGGGGGCAC 


TTTACCCCCG 


ACACAACACC 


CGATCCTGGC 


CGGGCACCAA 


240 


CGTGTCGCAG AAGAGGTTCT 


GCACAATTTC 


GCCAGGGGAC 


GTAGCACAGT 


GCTCGAGATA 


300 


GGGCCGTCTC TGCACAGCGC 


ACTTAAGCTA 


CATGGGGCAC 


CGAACGCCCC 


CGTCGCAGAC 


360 
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TATCACGGGT GCACCAAGTA CGGCACCCGC GACCGCTCGC GACACATTAC GGCCTTAGAG 420 

TCTAGATCCG TCGCCACAGG CCGGCCCGAG TTCAAGGCCG ACGCCTCACT GCTCGCCAAC 480 

GGCATTGCCT CCCGCACCTT CTGCGTCGAC GGAGTCGGCT CTTGCGCGTT CAAATCGCGC 540 

GTTGGAATTG CCAATCACTC CCTCTATGAC GTGACCCTAG AGGAGCTGGC CAATGCGTTT 600 

GAGAACCACG GACTTCACAT GGTCCGCGCG TTCATGCACA TGCCAGAAGA GCTGCTCTAC 660 

ATGGACAACG TGGTTAATGC CGAGCTCGGC TACCGCTTCC ACGTTATTGA AGAGCCTATG 720 

GCTGTGAAGG ACTGCGCATT CCAGGGGGGG GACCTCCGTC TCCACTTCCC TGAGTTGGAC 780 

TTCATCAACG AGAGCCAAGA GCGGCGCATC GAGAGGCTGG CCGCCCGCGG CTCCTACTCC 840 

AGACGCGCCG TCATTTTCTC CGGCGACGAC GACTGGGGTG ATGCGTACTT ACACGACTTC 900 

CACACATGGC TCGCCTACCT ACTGGTGAGG AACTACCCCA CTCCGTTTGG TTTCTCACTC 960 

CATATAGAAG TCCAGAGGCG CCACGGCTCC AGCATTGAGC TGCGCATCAC TCGCGCGCCA 1020 

CCTGGAGACC GCATGCTGGC CGTCGTCCCA AGGACGTCCC AAGGCCTCTG CAGAATCCCA 1080 

AACATCTTTT ATTACGCCGA CGCGTCGGGC ACTGAGCATA AGACCATCCT TACGTCACAG 1140 

CACAAAGTCA ACATGCTGCT CAATTTTATG CAAACGCGTC CTGAGAAGGA ACTAGTCCAC 1200 

ATGACCGTCT TGATGTCGTT CGCGCGCGCT AGGCTGCGCG CGATCGTGGT CGCCTCAGAA 1260 

GTCACCGAGA GCTCCTGGAA CATCTCACCG GCTGACCTGG TCCGCACTGT CGTGTCTCTT 1320 

TACGTCCTCC ACATCATCGA GCGCCGAAGG GCTGCGGTCG CTGTCAAGAC CGCCAAGGAC 1380 

GACGTCTTTG GAGAGACTTC GTTCTGGGAG AGTCTCAAGC ACGTCTTGGG CTCCTGTTGC 1440 

GGTCTGCGCA ACCTCAAAGG CACCGACGTC GTCTTTACTA AGCGCGTCGT CGATAAGTAC 1500 

CGAGTCCACT CGCTCGGAGA CATAATCTGC GACGTCCGCC TGTCCCCTGA ACAGGTCGGC 1560 

TTCCTGCCGT CCCGCGTACC ACCTGCCCGC GTCTTTCACG ACAGGGAAGA GCTTGAGGTC 1620 

CTTCGCGAAG CTGGCTGCTA CAACGAACGT CCGGTACCTT CCACTCCTCC TGTGGAGGAG 1680 

CCCCAAGGTT TCGACGCCGA CTTGTGGCAC GCGACCGCGG CCTCACTCCC CGAGTACCGC 1740 

GCCACCTTGC AGGCAGGTCT CAACACCGAC GTCAAGCAGC TCAAGATCAC CCTCGAGAAC 1800 

GCCCTCAAGA CCATCGACGG GCTCACCCTC TCCCCAGTCA GAGGCCTCGA GATGTACGAG 1860 

GGCCCGCCAG GCAGCGGCAA GACGGGCACC CTCATCGCCG CCCTTGAGGC CGCGGGCGGT 1920 

AAAGCACTTT ACGTGGCACC CACCAGAGAA CTGAGAGAGG CTATGGACCG GCGGATCAAA 1980 

CCGCCGTCCG CCTCGGCTAC GCAACATGTC GCCCTTGCGA TTCTCCGTCG TGCCACCGCC 2040 

GAGGGCGCCC CTTTCGCTAC CGTGGTTATC GACGAGTGCT TCATGTTCCC GCTCGTGTAC 2100 

GTCGCGATCC TGCACGCCTT GTCCCCGAGC TCACGAATAG TCCTTGTAGG GGACGTCCAC 2160 



SUBSTiTUTE SHEET 



wo 94/04660 



PCr/AU93/004n 



102 

CAAATCGGGT TTATAGACTT CCAAGGCACA AGCGCGAACA TGCCGCTCGT TCGCGACGTC 2220 

GTTAAGCAGT GCCGTCGGCG CACTTTCAAC CAAACCAAGC GCTGTCCGGC CGACGTCGTT 2280 

GCCACCACGT TTTTCCAGAG. CTTGTACCCC GGGTGCACAA CCACCTCAGG GTGCGTCGCA 2340 

TCCATCAGCC ACGTCGCCCC AGACTACCGC AACAGCCAGG CGCAAACGCT CTGCTTCACG 2400 

CAGGAGGAAA AGTCGCGCCA CGGGGCTGAG GGCGCGATGA CTGTGCACGA AGCGCAGGGA 2460 

CGCACTTTTG CGTCTGTCAT TCTGCATTAC AACGGCTCCA CAGCAGAGCA GAAGCTCCTC 2520 

GCTGAGAAGT CGCACCTTCT AGTCGGCATC ACGCGCCACA CCAACCACCT GTACATCCGC 2580 

GACCCGACAG GTGACATTGA GAGACAACTC AACCATAGCG CGAAAGCCGA GGTGTTTACA 2640 

GACATCCCTG CACCCCTGGA GATCACGACT GTCAAACCGA GTGAAGAGGT GCAGCGCAAC 2700 

GAAGTGATGG CAACGATACC CCCGCAGAGT GCCACGCCGC ACGGAGCAAT CCATCTGCTC 2760 

CGCAAGAACT TCGGGGACCA ACCCGACTGT GGCTGTGTCG CTTTGGCGAA GACCGGCTAC 2820 

GAGGTGTTTG GCGGTCGTGC CAAAATCAAC GTAGAGCTTG CCGAACCCGA CGCGACCCCG 2880 

AAGCCGCATA GGGCGTTCCA GGAAGGGGTA CAGTGGGTCA AGGTCACCAA CGCGTCTAAC 2940 

AAACACCAGG CGCTCCAGAC GCTGTTGTCC CGCTACACCA AGCGAAGCGC TGACCTGCCG 3000 

CTACACGAAG CTAAGGAGGA CGTCAAACGC ATGCTAAACT CGCTTGACCG ACATTGGGAC 3060 

TGGACTGTCA CTGAAGACGC CCGTGACCGA GCTGTCTTCG AGACCCAGCT CAAGTTCACC 3120 

CAACGCGGCG GCACCGTCGA AGACCTGCTG GAGCCAGACG ACCCCTACAT CCGTGACATA 3180 

GACTTCCTTA TGAAGACTCA GCAGAAAGTG TCGCCCAAGC CGATCAATAC GGGCAAGGTC 3240 

GGGCAGGGGA TCGCCGCTCA CTCAAAGTCT CTCAACTTCG TCCTCGCCGC TTGGATACGC 3300 

ATACTCGAGG AGATACTCCG TACCGGGAGC CGCACGGTCC GGTACAGCAA CGGTCTCCCC 3360 

GACGAAGAAG AGGCCATGCT GCTCGAAGCG AAGATCAATC AAGTCCCACA CGCCACGTTC 3420 

GTCTCGGCGG ACTGGACCGA GTTTGACACC GCCCACAATA ACACGAGTGA GCTGCTCTTC 3480 

GCCGCCCTTT TAGAGCGCAT CGGCACGCCT GCAGCTGCCG TTAATCTATT CAGAGAACGG 3540 

TGTGGGAAAC GCACCTTGCG AGCGAAGGGT CTAGGCTCCG TTGAAGTCGA CGGTCTGCTC 3600 

GACTCCGGCG CAGCTTGGAC GCCTTGCCGC AACACCATCT TCTCTGCCGC CGTCATGCTC 3660 

ACGCTCTTCC GCGGCGTCAA GTTCGCAGCT TTCAAAGGCG ACGACTCGCT CCTCTGTGGT 3720 

AGCCATTACC TCCGTTTCGA CGCTAGCCGC CTTCACATGG GCGAACGTTA CAAGACCAAA 3780 

CATTTGAAGG TCGAGGTGCA GAAAATCGTG CCGTACATCG GACTCCTCGT CTCCGCTGAG 3840 

CAGGTCGTCC TCGACCCTGT CAGGAGCGCT CTCAAGATAT TTGGGCGCTG CTACACAAGC 3900 

GAACTCCTTT ACTCCAAGTA CGTGGAGGCT GTGAGAGACA TCACCAAGGG CTGGAGTGAC 3960 
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GCCCGCTACC ACAGCCTCCT GTGCCACATG TCAGCATGCT ACTACAATTA CGCGCCGGAG 4020 

TCTGCGGCGT ACATCATCGA CGCTGTTGTT CGCTTTGGGC GCGGCGACTT CCCGTTTGAA 4080 

CAACTGCGCG TGGTGCGTGC CCATGTGCAG GCACCCGACG CTTACAGCAG CACGTATCCG 4140 

GCTAACGTGC GCGCATCGTG CCTTGACCAC GTCTTCGAGC CCCGCCAGGC CGCGGCCCCG 4200 

GCAGGTTTCG TTGCGAC ATG TGG GAA GCC GGA AAC GCC TTC TTC ACT TAG 4250 
Met Cys Glu Ala Gly Asn Ala Phe Phe Thr Tyr 
1 5 10 

CGC GAA AGC TGG TGT TTC TGG GAC TAG AAG CCA CGT TGG GAC TGG GAC 4298 
Arg Glu Ser Trp Cys Phe Cys Asp Tyr Lys Pro Arg Cys Asp Trp Asp 
15 20 25 

TGC GCC CGC GGA GTC TGG ATG GGA TGC ACC TGC AGC CAA CAG CTT TTC 4346 
Cys Ala Pro Gly Val Ser Met Gly Cys Thr Cys Ser Gin Gin Leu Phe 
^30 35 40 

GGA CTT ATT GAC ACC GGA GAC CGC GTC CAC ATC ATC CTC GCC GTC ATC 4394 
Gly Val He Asp Thr Gly Asp Pro Val His He He Leu Ala Val He 
45 50 55 

GTC TTC ATC GGA CTC CTC TAG ATC GTG TGG AAG GTC GCT CAG TGG TGG 4442 
Val Phe He Gly Leu Leu Tyr He Val Trp Lys Val Ala Gin Trp Trp 
60 65 70 75 

AGA CAC CGC AAG GAC CAC AGA AGA CTT GAA CAG CAG AAA GCC GCC TTC 4490 
Arg His Arg Lys Asp His Arg Arg Leu Glu Gin Gin Lys Ala Ala Phe 
80 85 90 

GGA AGA CAG GCA ATC ACG CTC GTC TGAATGTC TGGACAGAAG CGGAGAAAGG 4542 
Ala Arg Gin Ala He Thr Leu Val 
95 



ACAGGCAGTT CGTTAACTGC 


CCCCACTGCT 


CCGAGCCCCT 


CATTCTCATT TTCGGAAAGA 


4602 


GCTCGACTGG CGACCGGGCC 


GACTGTCGCC 


GCTGCGACAT 


CACCTTCGGC AACCCCATCC 


4662 


TGCGCCACGG ACCAGGTTGC 


CGCGAGGACC 


ACGCCGGACT 


TTGCGCCTTT CCTGGGTTCC 


4722 


CAGTCTGCCC GTGCTGTCTC 


GAAGCCGTAC 


CGGCCCCCCA 


CGACTGCCCG TTGGAAAGAA 


4782 


GTCACCCCGC TCCACGCGTG 


GAAGGGCGTG 


ACCGGAGACC 


GACCGGAAGT CAGGGAGGAC 


4842 


CCGGAGACAG CGGCGGTCGT 


CCAGGCTCTG 


ATCAGCGGCC 


GTTATCCTCA GAAGACGAAG 


4902 


CTTTCCTCCG ACGCATCCAA 


AGGCTACTCA 


AGAACTAAGG 


GATGCTCACA ATCCACCTCT 


4962 


TTTCCTGCCC CGAGTGCGGA 


TTACCAGGCC 


CGCGACTGCC 


AGACAGTCCG AGTCTGCCGC 


5022 


GCCGCTGCAG AGATGGCGCG 


CTCATGTATT 


CACGAGCCGT 


TGGCTTCATC TGCCGCCAGT 


5082 


GCCGACTTGA AGCGCATACG 


CTCTACCTCG 


GACTCTGTTC 


CCGATGTAAA GATCAGCAAG 


5142 


AGCGCATGAA GGAACAAAAT 


TAGTTTCCTT 


GTTCGTAAAC 


AAGGTGGTCC CTCCCATTGA 


5202 


GGTAAAGACT CTGGTGAGTC 


CTCAACGTTA 


CTCGTTGAGT 


CTGCTGCGGT TCGATTCCAT 


5262 


TCCCAAGCAG CAAAGGGTGC 


GCAACTAGTA 


CGGCGCCCCC 


TGGGATACCA 


5312 
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(2) INFORMATION FOR SEQ ID N0:A2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 99 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:42: 

Met Cys Glu Ala Gly Asn Ala Phe Phe Thr Tyr Arg Glu Ser Trp Cys 
15 10 15 

Phe Cys Asp Tyr Lys Pro Arg Cys Asp Trp Asp Cys Ala Pro Gly Val 
20 25 30 

Ser Met Gly Cys Thr Cys Ser Gin Gin Leu Phe Gly Val He Asp Thr 
35 AO 45 

Gly Asp Pro Val His He He Leu Ala Val He Val Phe He Gly Leu 
50 55 60 

Leu Tyr He Val Trp Lys Val Ala Gin Trp Trp Arg His Arg Lys Asp 
65 70 75 80 

His Arg Arg Leu Glu Gin Gin Lys Ala Ala Phe Ala Arg Gin Ala He 
85 90 95 

Thr Leu Val 



(2) INFORMATION FOR SEQ ID NO: A3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5312 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: A518..A937 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: A3: 



GTTCTGCCTC CCCCGGACGG TAAATATAGG GGAACAATGT ACGCGAAAGC GACAGACGTG 60 

GCGCGTGTCT ACGCCGCGGC AGATGTCGCC TACGCGAACG TACTGCAGCA GAGAGCAGTC 120 

AAGTTGGACT TCGCCCCGCC ACTGAAGGCA CTAGAAACCC TCCACAGACT GTACTATCCG 180 

CTGCGCTTCA AAGGGGGCAC TTTACCCCCG ACACAACACC CGATCCTGGC CGGGCACCAA 2A0 
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CGTGTCGCAG AAGAGGTTCT GCACAATTTC GCCAGGGGAC GTAGCACAGT GCTCGAGATA 300 

GGGCCGTCTC TGCACAGCGC ACTTAAGCTA CATGGGGCAC CGAACGCCCC CGTCGCAGAC 360 

TATCACGGGT GCACCAAGTA CGGCACCCGC GACGGCTCGC GACACATTAC GGCCTTAGAG 420 

TCTAGATCCG TCGCCACAGG CCGGCCCGAG TTCAAGGCCG ACGCCTCACT GCTCGCCAAC 480 

GGCATTGCCT CCCGCACCTT CTGCGTCGAC GGAGTCGGCT CTTGCGCGTT CAAATCGCGC 540 

GTTGGAATTG CCAATCACTC CCTCTATGAC GTGACCCTAG AGGAGCTGGC CAATGCGTTT 600 

GAGAACCACG GACTTCACAT GGTCCGCGCG TTCATGCACA TGCCAGAAGA GCTGCTCTAC 660 

ATGGACAACG TGGTTAATGC CGAGCTCGGC TACCGCTTCC ACGTTATTGA AGAGCCTATG 720 

GCTGTGAAGG ACTGCGCATT CCAGGGGGGG GACCTCCGTC TCCACTTCCC TGAGTTGGAC 780 

TTCATCAACG AGAGCCAAGA GCGGCGCATC GAGAGGCTGG CCGCCCGCGG CTCCTACTCC 840 

AGACGCGCCG TCATTTTCTC CGGCGACGAC GACTGGGGTG ATGCGTACTT ACACGACTTC 900 

CACACATGGC TCGCCTACCT ACTGGTGAGG AACTACCCCA CTCCGTTTGG TTTCTCACTC 960 

CATATAGAAG TCCAGAGGCG CCACGGCTCC AGCATTGAGC TGCGCATCAC TCGCGCGCCA 1020 

CCTGGAGACC GCATGCTGGC CGTCGTCCCA AGGAGGTCCC AAGGCCTCTG CAGAATCCCA 1080 

AACATCTTTT ATTACGCCGA CGCGTCGGGC ACTGAGCATA AGACCATCCT TACGTCACAG 1140 

CACAAAGTCA ACATGCTGCT CAATTTTATG CAAACGCGTC CTGAGAAGGA ACTAGTCGAC 1200 

ATGACCGTCT TGATGTCGTT CGCGCGCGCT AGGCTGCGCG CGATCGTGGT CGCCTCAGAA 1260 

GTCACCGAGA GCTCCTGGAA CATCTCACCG GCTGACCTGG TCCGCACTGT CGTGTCTCTT 1320 

TACGTCCTCC ACATCATCGA GCGCCGAAGG GCTGCGGTCG CTGTCAAGAC CGCCAAGGAC 1380 

GACGTCTTTG GAGAGACTTC GTTCTGGGAG AGTCTCAAGC ACGTCTTGGG CTCCTGTTGC 1440 

GGTCTGCGCA ACCTCAAAGG CACCGACGTC GTCTTTACTA AGCGCGTCGT CGATAAGTAC 1500 

CGAGTCCACT CGCTCGGAGA CATAATCTGC GACGTCCGCC TGTCCCCTGA ACAGGTCGGC 1560 

TTCCTGCCGT CCCGCGTACC ACCTGCCCGC GTCTTTCACG ACAGGGAAGA GCTTGAGGTC 1620 

CTTCGCGAAG CTGGCTGCTA CAACGAACGT CCGGTACCTT CCACTCCTCC TGTGGAGGAG 1680 

CCCCAAGGTT TCGACGCCGA CTTGTGGCAC GCGACCGCGG CCTCACTCCC CGAGTACCGC 1740 

GCCACCTTGC AGGCAGGTCT CAACACCGAC GTCAAGCAGC TCAAGATCAC CCTCGAGAAC 1800 

GCCCTCAAGA CCATCGACGG GCTCACCCTC TCCCCAGTCA GAGGCCTCGA GATGTACGAG 1860 

GGCCCGCCAG GCAGCGGCAA GACGGGCACC CTCATCGCCG CCCTTGAGGC CGCGGGCGGT 1920 

AAAGCACTTT ACGTGGCACC CACCAGAGAA CTGAGAGAGG CTATGGACCG GCGGATCAAA 1980 

CCGCCGTCCG CCTCGGCTAC GCAACATGTC GCCCTTGCGA TTCTCCGTCG TGCCACCGCC 2040 

GAGGGCGCCC CTTTCGCTAC CGTGGTTATC GACGAGTGCT TCATGTTCCC GCTCGTGTAC 2100 

GTCGCGATCG TGCACGCCTT GTCCCCGAGC TCACGAATAG TCCTTGTAGG GGACGTCCAC 2160 
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CAAATCGGGT TTATAGACTT CCAAGGCACA AGCGCGAACA TGCCGCTCGT TCGCGACGTC 2220 

GTTAAGCAGT GCCGTCGGCG CACTTTCAAC CAAACCAAGC GCTGTCCGGC CGACGTCGTT 2280 

GCCACCACGT TTTTCCAGAG CTTGTACCCC GGGTGCACAA CCACCTCAGG GTGCGTCGCA 23A0 

TCCATCAGCC ACGTCGCCCC AGACTACCGC AACAGCCAGG CGCAAACGCT CTGCTTCACG 2400 

CAGGAGGAAA AGTCGCGCCA CGGGGCTGAG, GGCGCGATGA CTGTGCACGA AGCGCAGGGA 2460 

CGCACTTTTG CGTCTGTCAT TCTGCATTAC AACGGCTCCA CAGCAGAGCA GAAGCTCCTC 2520 

GCTGAGAAGT CGCACCTTCT AGTCGGCATC ACGCGCCACA CCAACCACCT GTACATCCGC 2580 

GACCCGACAG GTGACATTGA GAGACAACTC AACCATAGCG CGAAAGCCGA GGTGTTTACA 2640 

GACATCCCTG CACCCCTGGA GATCACGACT GTCAAACCGA GTGAAGAGGT GCAGCGCAAC 2700 

GAAGTGATGG CAACGATACC CCCGCAGAGT GCCACGCCGC ACGGAGCAAT CCATCTGCTC 2760 

CGCAAGAACT TCGGGGACCA ACCCGACTGT GGCTGTGTCG CTTTGGCGAA GACCGGCTAC 2820 

GAGGTGTTTG GCGGTCGTGC CAAAATCAAC GTAGAGCTTG CCGAACCCGA CGCGACCCCG 2880 

AAGCCGCATA GGGCGTTCCA GGAAGGGGTA CAGTGGGTCA AGGTCACCAA CGCGTCTAAC 2940 

AAACACCAGG CGCTCCAGAC GCTGTTGTCC CGCTACACCA AGCGAAGCGC TGACCTGCCG 3000 

CTACACGAAG CTAAGGAGGA CGTCAAACGC ATGCTAAACT CGCTTGACCG ACATTGGGAC 3060 

TGGACTGTCA CTGAAGACGC CCGTGACCGA GCTGTCTTCG AGACCCAGCT CAAGTTCACC 3120 

CAACGCGGCG GCACCGTCGA AGACCTGCTG GAGCCAGACG ACCCCTACAT CCGTGACATA 3180 

GACTTCCTTA TGAAGACTCA GCAGAAAGTG TCGCCCAAGC CGATCAATAC GGGCAAGGTC 3240 

GGGCAGGGGA TCGCCGCTCA CTCAAAGTCT CTCAACTTCG TCCTCGCCGC TTGGATACGC 3300 

ATACTCGAGG AGATACTCCG TACCGGGAGC CGCACGGTCC GGTACAGCAA CGGTCTCCCC 3360 

GACGAAGAAG AGGCCATGCT GCTCGAAGCG AAGATCAATC AAGTCCCACA CGCCACGTTC 3420 

GTCTCGGCGG ACTGGACCGA GTTTGACACC GCCCACAATA ACACGAGTGA GCTGCTCTTC 3480 

GCCGCCCTTT TAGAGCGCAT CGGCACGCCT GCAGCTGCCG TTAATCTATT CAGAGAACGG 3540 

TGTGGGAAAC GCACCTTGCG AGCGAAGGGT CTAGGCTCCG TTGAAGTCGA CGGTCTGCTC 3600 

GACTCCGGCG CAGCTTGGAC GCCTTGCCGC AACACCATCT TCTCTGCCGC CGTCATGCTC 3660 

ACGCTCTTCC GCGGCGTCAA GTTCGCAGCT TTCAAAGGCG ACGACTCGCT CCTCTGTGGT 3720 

AGCCATTACC TCCGTTTCGA CGCTAGCCGC CTTCACATGG GCGAACGTTA CAAGACCAAA 3780 

CATTTGAAGG TCGAGGTGCA GAAAATCGTG CCGTACATCG GACTCCTCGT CTCCGCTGAG 3840 

CAGGTCGTCC TCGACCCTGT CAGGAGCGCT CTCAAGATAT TTGGGCGCTG CTACACAAGC 3900 

GAACTCCTTT ACTCCAAGTA CGTGGAGGCT GTGAGAGACA TCACCAAGGG CTGGAGTGAC 3960 
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GCCCGCTACC 


ACAGCCTCCt 


GTGCGACATG 


TCAGCATGCT ACTACAATTA 


CGCGCCGGAG 


4020 


TCTGCGGCGT 


ACATCATCGA 


CGCTGTTGTT 


CGCTTTGGGC GCGGCGACTT 


CCCGTTTGAA 


4080 


CAACTGCGCG 


TGGTGCGTGC 


CCATGTGCAG 


GCACCCGACG CTTACAGCAG 


CACGTATCCG 


4140 


GCTAACGTGC 


GCGCATCGTG 


CCTTGACCAC 


GTCTTCGAGC CCCGCCAGGC 


CGGCGCCCCG 


4200 


GCAGGTTTCG 


TTGCGACATG 


TGCGAAGGCG 


GAAACGCCTT CTTCACTTAC 


CGCGAAAGCT 


4260 


GGTGTTTCTG 


CGACTACAAG 


CCACGTTGCG 


ACTGGGACTG CGCCCCCGGA 


GTCTCCATGG 


4320 


GATGCACCTG 


CAGCCAACAG 


CTTTTCGGAG 


TTATTGACAC CGGAGACCCC 


GTCCACATCA 


4380 


TCCTCGCCGT 


CATCGTCTTC 


ATCGGACTCC 


TCTACATCGT GTGGAAGGTC 


GCTCAGTGGT 


4440 


GGAGACACCG 


CAAG6ACCAC 


AGAAGAGTTG 


AACAGCAGAA AGCCGGGTTC 


GCAAGACAGG 


4500 


CAATCACGCT 


CGTCTGA ATG TCT GGA GAG AAG CGG AGA AAG GAG 
Met Ser Gly Gin Lys Arg Arg Lys Asp 
1 5 


AGG GAG 
Arg Gin 
10 


4550 



TTC GTT AAC TGC CCC GAG TGC TCC GAG CCC CTC ATT CTC ATT TTC GGA 4598 
Phe Val Asn Cys Pro His Cys Ser Glu Pro Leu lie Leu lie Phe Gly 
15 20 25 

AAG AGC TCG ACT GGC GAG CGG CCC GAC TGT CCC CGC TGC GAC ATG ACC 4646 
Lys Ser Ser Thr Gly Asp Arg Ala Asp Cys Arg Arg Cys Asp He Thr 
30 35 40 

TTC GGC AAC CCC ATG CTG CGC GAG GGA CCA GGT TGC CGC GAG GAC CAC 4694 
Phe Gly Asn Pro He Leu Arg His Gly Pro Gly Cys Arg Glu Asp His 
45 50 55 

GGC GGA CTT TGC GGC TTT CCT GGG TTC CCA GTC TGC CGG TGC TGT CTC 4742 
Ala Gly Leu Cys Ala Phe Pro Gly Phe Pro Val Cys Pro Cys Cys Leu 
60 65 70 75 

GAA GCC GTA CGG GCC CCC GAG GAC TGC CGG TTG GAA AGA ACT CAC CCC 4790 
Glu Ala Val Pro Ala Pro His Asp Cys Pro Leu Glu Arg Ser His Pro 
80 85 90 

GCT CCA CGC GTG GAA GGG GGT GAC CGG AGA CGG ACC GGA ACT CAG GGA 4838 
Ala Pro Arg Val Glu Gly Arg Asp Arg Arg Pro Thr Gly Ser Gin Gly 
95 100 105 

GGA CCC GGA GAC AGC GGC GGT GGT CCA GGC TCT GAT CAG CGG CGG TTA 4886 
Gly Pro Gly Asp Ser Gly Gly Arg Pro Gly Ser Asp Gin Arg Pro Leu 
110 115 120 

TCC TCA GAA GAC GAA GCT TTC CTC GGA CGC ATG GAA AGG CTA CTC AAG 4934 
Ser Ser Glu Asp Glu Ala Phe Leu Arg Arg He Gin Arg Leu Leu Lys 
125 130 135 

AAC TAAGGGATGC TCACAATCCA CCTCTTTTCC TGCCCCGAGT GCGGATTACC 4987 

Asn 

140 

AGGCCCGCGA CTGCCAGACA GTCCGAGTCT GCCGCGCCGC TGCAGAGATG GCGCGCTCAT 5047 
GTATTCACGA GCCGTTGGCT TCATCTGCCG CCAGTGCCGA CTTGAAGCGC ATACGCTCTA 5107 
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CCTCGGACTC TGTTCCCGAT GTAAAGATCA GCAAGAGCGC ATGAAGGAAC AAAATTAGTT 5167 

TCCTTGTTCG TAAACAAGGT GGTCCCTCCC ATTGAGGTAA AGACTCTGGT GAGTCCTCAA 5227 

CGTTACTCGT TGAGTCTGCT GCGGTTCGAT TCCATTCCCA AGCAGCAAAG CGTGCGCAAC 5287 

TAGTACGGCG CCCCCTGGGA TACCA 5312 

(2) INFORMATION FOR SEQ ID N0:4A: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 140 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:A4: 

Met Ser Gly Gin Lys Arg Arg Lys Asp Arg Gin Phe Val Asn Cys Pro 
1 5 10 15 

His Cys Ser Glu Pro Leu He Leu He Phe Gly Lys Ser Ser Thr Gly 
20 25 30 

Asp Arg Ala Asp Cys Arg Arg Cys Asp He Thr Phe Gly Asn Pro He 
35 AO 45 

Leu Arg His Gly Pro Gly Cys Arg Glu Asp His Ala Gly Leu Cys Ala 
50 55 60 

Phe Pro Gly Phe Pro Val Cys Pro Cys Cys Leu Glu Ala Val Pro Ala 
65 70 75 BO 

Pro His Asp Cys Pro Leu Glu Arg Ser His Pro Ala Pro Arg Val Glu 
85 90 95 

Gly Arg Asp Arg Arg Pro Thr Gly Ser Gin Gly Gly Pro Gly Asp Ser 
100 105 110 

Gly Gly Arg Pro Gly Ser Asp Gin Arg Pro Leu Ser Ser Glu Asp Glu 
115 120 125 

Ala Phe Leu Arg Arg He Gin Arg Leu Leu Lys Asn 
130 135 140 
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(2) INFORKATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5312 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 4944.. 5162 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:45: 

GTTCTGCCTC CCCCGGACGG TAAATATAGG GGAACAATGT ACGCGAAAGC GACAGACGTG 60 

GCGCGTGTCT ACGCCGCGGC AGATGTCGCC TACGCGAACG TACTGCAGCA GAGAGCAGTC 120 

AAGTTGGACT TCGCCCCGCC ACTGAAGGCA CTAGAAACCC TCCACAGACT GTACTATCCG 180 

CTGCGCTTCA AAGGGGGCAC TTTACCCCCG ACACAACACC CGATCCTGGC CGGGCACCAA 240 

CGTGTCGCAG AAGAGGTTCT GCACAATTTC GCCAGGGGAC GTAGCACAGT GCTCGAGATA 300 

GGGCCGTCTC TGCACAGCGC ACTTAAGCTA CATGGGGCAC CGAACGCCCC CGTCGCAGAC 360 

TATCACGGGT GCACCAACTA CGGCACCCCC GACGGCTCGC GACACATTAC GGCCTTAGAG 420 

TCTAGATCCG TCGCCACAGG CCGGCCCGAG TTCAAGGCCG ACGCCTCACT GCTCGCCAAC 480 

GGCATTGCCT CCCGCACCTT CTGCGTCGAC GGAGTCGGCT CTTGCGCGTT CAAATCGCGC 540 

GTTGGAATTG CCAATCACTC CCTCTATGAC GTGACCCTAG AGGAGCTGGC CAATGCGTTT 600 

GAGAACCACG GACTTCACAT GGTCCGCGCG TTCATGCACA TGCCAGAAGA GCTGCTCTAC 660 

ATGGACAACG TGGTTAATGC CGAGCTCGGC TACCGCTTCC ACGTTATTGA AGAGCCTATG 720 

GCTGTGAAGG ACTGCGCATT CCAGGGGGGG GACCTCCGTC TCCACTTCCC TGAGTTGGAC 780 

TTCATCAACG AGAGCCAAGA GCGGCGCATC GAGAGGCTGG CCGCCCGCGG CTCCTACTCC 840 

AGACGCGCCG TCATTTTCTC CGGCGACGAC GACTGGGGTG ATGCGTACTT ACACGACTTC 900 

CACACATGGC TCGCCTACCT ACTGGTGAGG AACTACCCCA CTCCGTTTGG TTTCTCACTC 960 

CATATAGAAG TCCAGAGGCG CCACGGCTCC AGCATTGAGC TGCGCATCAC TCGCGCGCCA 1020 

CCTGGAGACC GCATGCTGGC CGTCGTCCCA AGGACGTCCC AAGGCCTCTG CAGAATCCCA 1080 

AACATCTTTT ATTACGCCGA CGCGTCGGGC ACTGAGCATA AGACCATCCT TACGTCACAG 1140 

CACAAAGTCA ACATGCTGCT CAATTTTATG CAAACGCGTC CTGAGAAGGA ACTAGTCGAC 1200 

ATGACCGTCT TGATGTCGTT CGCGCGCGCT AGGCTGCGCG CGATCGTGGT CGCCTCAGAA 1260 
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GTCACCGAGA GCTCCTGGAA CATCTCACCG GCTGACCTGG TCCGCACTGT CGTGTCTCTT 1320 

TACGTCCTCC ACATCATCGA GCGCCGAAGG GCTGCGGTCG CTGTCAAGAC CGCCAAGGAC 1380 

GACGTCTTTG GAGAGACTTC GTTCTGGGAG AGTCTCAAGC ACGTCTTGGG CTCCTGTTGC 1440 

GGTCTGCGCA ACCTCAAAGG CACCGACGTC GTCTTTACTA AGCGCGTCGT CGATAAGTAC 1500 

CGAGTCCACT CGCTCGGAGA CATAATCTGC GACGTCCGCC TGTCCCCTGA ACAGGTCGGC 1560 

TTCCTGCCGT CCCGCGTACC ACCTGCCCGC GTCTTTCACG ACAGGGAAGA GCTTGAGGTC 1620 

CTTCGCGAAG CTGGCTGCTA CAACGAACGT CCGGTACCTT CCACTCCTCC TGTGGAGGAG 1680 

CCCCAAGGTT TCGACGCCGA CTTGTGGCAC GCGACCGCGG CCTCACTCCC CGAGTACCGC 1740 

GCCACCTTGC AGGCAGGTCT CAACACCGAC GTCAAGCAGC TCAAGATCAC CCTCGAGAAC 1800 

GCCCTCAAGA CCATCGACGG GCTCACCCTC TCCCCAGTCA GAGGCCTCGA GATGTACGAG 1860 

GGCCCGCCAG GCAGCGGCAA GACGGGCACC CTCATCGCCG CCCTTGAGGC CGCGGGCGGT 1920 

AAAGCACTTT ACGTGGCACC CACCAGAGAA CTGAGAGAGG CTATGGACCG GCGGATCAAA 1980 

CCGCCGTCCG CCTCGGCTAC GCAACATGTC GCCCTTGCGA TTCTCCGTCG TGCCACCGCC 2040 

GAGGGCGCCC CTTTCGCTAC CGTGGTTATC GACGAGTGCT TCATGTTCCC GCTCGTGTAC 2100 

GTCGCGATCG TGCACGCCTT GTCCCCGAGC TCACGAATAG TCCTTGTAGG GGACGTCCAC 2160 

CAAATCGGGT TTATAGACTT CCAAGGCACA AGCGCGAACA TGCCGCTCGT TCGCGACGTC 2220 

GTTAAGCAGT GCCGTCGGCG CACTTTCAAC CAAACCAAGC GCTGTCCGGC CGACGTCGTT 2280 

GCCACCACGT TTTTCCAGAG CTTGTACCCC GGGTGCACAA CCACCTCAGG GTGCGTCGCA 2340 

TCCATCAGCC ACGTCGCCCC AGACTACCGC AACAGCCAGG CGCAAACGCT CTGCTTCACG 2400 

CAGGAGGAAA AGTCGCGCCA CGGGGCTGAG GGCGCGATGA CTGTGCACGA AGCGCAGGGA 2460 

CGCACTTTTG CGTCTGTCAT TCTGCATTAC AACGGCTCCA CAGCAGAGCA GAAGCTCCTC 2520 

GCTGAGAAGT CGCACCTTCT AGTCGGCATC ACGCGCCACA CCAACCACCT GTACATCCGC 2580 

GACCCGACAG GTGACATTGA GAGACAACTC AACCATAGCG CGAAAGCCGA GGTGTTTACA 2640 

GACATCCCTG CACCCCTGGA GATCACGACT GTCAAACCGA GTGAAGAGGT GCAGCGCAAC 2700 

GAAGTGATGG CAACGATACC CCCGCAGAGT GCCACGCCGC ACGGAGCAAT CCATCTGCTC 2760 

CGCAAGAACT TCGGGGACCA ACCCGACTGT GGCTGTGTCG CTTTGGCGAA GACCGGCTAC 2820 

GAGGTGTTTG GCGGTCGTGC CAAAATCAAC GTAGAGCTTG CCGAACCCGA CGCGACCCCG 2880 

AAGCCGCATA GGGCGTTCCA GGAAGGGGTA CAGTGGGTCA AGGTCACCAA CGCGTCTAAC 2940 

AAACACCAGG CGCTCCAGAC GCTGTTGTCC CGCTACACCA AGCGAAGCGC TGACCTGCCG 3000 

CTACACGAAG CTAAGGAGGA CGTCAAACGC ATGCTAAAGT CGCTTGACCG ACATTGGGAC 3060 
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TGGACTGTCA CTGAAGACGC CCGTGACCGA GCTGTCTTCG AGACCCAGCT CAAGTTCACC 3120 

CAACGCGGCG GCACCGTCGA AGACCTGCTG GAGCCAGACG ACCCCTACAT CCGTGACATA 3180 

GACTTCCTTA TGAAGACTCA GCAGAAAGTG TCGCCCAAGC CGATCAATAC GGGCAAGGTC 32A0 

GGGCAGGGGA TCGCCGCTCA CTCAAAGTCT CTCAACTTCG TCCTCGCCGC TTGGATACGC 3300 

ATACTCGAGG AGATACTCCG TACCGGGAGC CGCACGGTCC GGTACAGCAA CGGTCTCCCC 3360 

GACGAAGAAG AGGCCATGCT GCTCGAAGCG AAGATCAATC AAGTCCCACA CGCCACGTTC 3420 

GTCTCGGCGG ACTGGACCGA GTTTGACACC GCCCACAATA ACACGAGTGA GCTGCTCTTC 3480 

GCCGCCCTTT TAGAGCGCAT CGGCACGCCT GCAGCTGCCG TTAATCTATT CAGAGAACGG 3540 

TGTGGGAAAC GCACCTTGCG AGCGAAGGGT CTAGGCTCCG TTGAAGTCGA CGGTCTGCTC 3600 

GACTCCGGCG CAGCTTGGAC GCCTTGCCGC AACACCATCT TCTCTGCCGC CGTCATGCTC 3660 

ACGCTCTTCC GCGGCGTCAA GTTCGCAGCT TTCAAAGGCG ACGACTCGCT CCTCTGTGGT 3720 

AGCCATTACC TCCGTTTCGA CGCTAGCCGC CTTCACATGG GCGAACGTTA CAAGACCAAA 3780 

CATTTGAAGG TCGAGGTGCA GAAAATCGTG CCGTACATCG GACTCCTCGT CTCCGCTGAG 3840 

CAGGTCGTCC TCGACCCTGT CAGGAGCGCT CTCAAGATAT TTGGGCGCTG CTACACAAGC 3900 

GAACTCCTTT ACTCCAAGTA CGTGGAGGCT GTGAGAGACA TCACCAAGGG CTGGAGTGAC 3960 

GCCCGCTACC ACAGCCTCCT GTGCCACATG TCAGCATGCT ACTACAATTA CGCGCCGGAG 4020 

TCTGCGGCGT ACATCATCGA CGCTGTTGTT CGCTTTGGGC GCGGCGACTT CCCGTTTGAA 4080 

CAACTGCGCG TGGTGCGTGC CCATGTGCAG GCACCCGACG CTTACAGCAG CACGTATCCG 4140 

GCTAACGTGC GCGCATCGTG CCTTGACCAC GTCTTCGAGC CCCGCCAGGC CGCCGCCCCG 4200 

GCAGGTTTCG TTGCGACATG TGCGAAGCCG GAAACGCCTT CTTCACTTAC CGCGAAAGCT 4260 

GGTGTTTCTG CGACTACAAG CCACGTTGCG ACTGGGACTG CGCCCCCGGA GTCTCCATGG 4320 

GATGCACCTG CAGCCAACAG CTTTTCGGAG TTATTGACAC CGGAGACCCC GTCCACATCA 4380 

TCCTCGCCGT CATCGTCTTC ATCGGACTCC TCTACATCGT GTGGAAGGTC GCTCAGTGGT 4440 

GGAGACACCG CAAGGACCAC AGAAGACTTG AACAGCAGAA AGCCGCCTTC GCAAGACAGG 4500 

CAATCACGCT CGTCTGAATG TCTGGACAGA AGCGGAGAAA GGACAGGCAG TTCGTTAACT 4560 

GCCCCCACTG CTCCGAGCCC CTCATTCTCA TTTTCGGAAA GAGCTCGACT GGCGACCGGG 4620 

CCGACTGTCG CCGCTGCGAC ATCACCTTCG GCAACCCCAT CCTGCGCCAC GGACCAGGTT 4680 

GCCGCGAGGA CCACGCCGGA CTTTGCGCCT TTCCTGGGTT CCCAGTCTGC CCGTGCTGTC 4740 

TCGAAGCCGT ACCGGCCCCC CACGACTGCC CGTTGGAAAG AAGTCACCCC GCTCCACGCG 4800 

TGGAAGGGCG TGACCGGAGA CCGACCGGAA GTCAGGGAGG ACCCGGAGAC AGCGGCGGTC 4860 

GTCCAGGCTC TGATCAGCGG CCGTTATCCT CAGAAGACGA AGCTTTCCTC CGACGCATCC 4920 
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AAAGGCTACT CAAGAACTAA GGG ATG CTC ACA ATC GAG CTG TTT TGG TGG A970 

Met Leu Thr lie His Leu Phe Ser Gys 
1 5 

GGG GAG TGG GGA TTA CGA GGG GGG GGA GTG GGA GAG AGT CGG AGT CTG 5018 
Pro Glu Gys Gly Leu Pro Gly Pro Arg Leu Pro Asp Ser Pro Ser Leu 
ID 15 20 25 

GGG GGG GGG TGG AGA GAT GGG GGG GTG ATG TAT TGA CGA GGG GTT GGG 5066 
Pro Arg Arg Gys Arg Asp Gly Ala Leu Met Tyr Ser Arg Ala Val Gly 
30 35 40 

TTG ATG TGG GGG GAG TGG CGA GTT GAA GGG GAT ACG CTC TAG CTC GGA 5114 
Phe He Gys Arg Gin Gys Arg Leu Glu Ala His Thr Leu Tyr Leu Gly 
45 50 55 

GTG TGT TGG CGA TGT AAA GAT GAG GAA GAG CGG ATG AAG GAA GAA AAT 5162 
Leu Gys Ser Arg Gys Lys Asp Gin Gin Glu Arg Met Lys Glu Gin Asn 
60 65 70 

TAGTTTGGTT GTTGGTAAAG AAGGTGGTGG CTCGCATTGA GGTAAAGACT CTGGTGAGTG 5222 

GTCAACGTTA CTGGTTGAGT GTGGTGGGGT TCGATTCCAT TGGGAAGCAG GAAAGGGTGC 5282 

GCAAGTAGTA CGGGGGCGCG TGGGATACCA 5312 



(2) INFORMATION FOR SEQ ID NO:46: 

(i) SEOUENGE CHARACTERISTICS: 

(A) LENGTH: 73 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 

Met Leu Thr He His Leu Phe Ser Gys Pro Glu Gys Gly Leu Pro Gly 
15 10 15 

Pro Arg Leu Pro Asp Ser Pro Ser Leu Pro Arg Arg Cys Arg Asp Gly 
20 25 30 

Ala Leu Met Tyr Ser Arg Ala Val Gly Phe He Cys Arg Gin Cys Arg 
35 40 45 

Leu Glu Ala His Thr Leu Tyr Leu Gly Leu Cys Ser Arg Cys Lys Asp 
50 55 60 

Gin Gin Glu Arg Met Lys Glu Gin Asn 
65 70 
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(2) INFORMATION FOR SEQ ID NO:47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2478 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 283,. 753 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 

GTTTTTCTTT CTTTACCAAG TGTGGTAAAA TTTAAACAAA GAAGAAAACC AGGACCGTAA 60 

CCCGGCCCTT ACACACCTCG AGTCCGTGAC CACCGGATTA TACGTCGCCC ACCACACGGC 120 

GCCTTTTCCG ACCACTCTCG AGAGTCGTTG GGAGTTTCGT CCGTGACCAC CCGGTTGGCA 180 

GTCGACAGAC GCTTCCGGAC CACTAGAACC TCCTCGAGCG ACGCACACAC AGCACACACA 240 

CCGCCTTAGC TGCACCTACG GCAGCGTTGA TAGCGCGGAT TT ATG AGC GAG CAC 294 

Met Ser Glu His 
1 

ACC ATC GCC CAC TCC ATC ACA TTA CCA CCC GGT TAG ACC CTT GCC CTA 342 
Thr He Ala His Ser He Thr Leu Pro Pro Gly Tyr Thr Leu Ala Leu 
5 10 15 20 

ATA CCC CCT GAA CCT GAA GCA GGA TGG GAG ATG CTG GAG TGG CGT CAC 390 
He Pro Pro Glu Pro Glu Ala Gly Trp Glu Met Leu Glu Trp Arg His 
25 30 35 

AGC GAG CTG ACA ACC GTC GCG GAA CCC GTA ACG TTC GGG TCA GCG CCA 438 
Ser Asp Leu Thr Thr Val Ala Glu Pro Val Thr Phe Gly Ser Ala Pro 
40 45 50 

ACA GCG TCA CCG TCA ATG GTA GAA GAA ACC AAC GGC GTC GGA GCG GAA 486 
Thr Pro Ser Pro Ser Met Val Glu Glu Thr Asn Gly Val Gly Pro Glu 
55 60 65 

GGC AAG TTT CTC CCC CTG ACA ATT TCA CCG CTG CTG CAC AAG ACC TGG 534 
Gly Lys Phe Leu Pro Leu Thr He Ser Pro Leu Leu His Lys Thr Ser 
70 75 80 

CGC AAA GCC TTG ACG CCA ACA CCG TCA CTT TCC CCG CTA ACA TCT CTA 582 
Are Lys Ala Leu Thr Pro Thr Pro Ser Leu Ser Pro Leu Thr Ser Leu 
85 90 95 100 

GCA TGC CCG AAT TCC GGA ATT GGG CCA AGG GAA AGA TCG ACC TCG ACT 630 
Ala Cys Pro Asn Ser Gly He Gly Pro Arg Glu Arg Ser Thr Ser Thr 
^ 105 110 115 
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CCG ATT CCA TCG OCT GGT ACT TCA ACT ACC TTG ACC CAG CGG GTG CTA 678 
Pro lie Pro Ser Ala Gly Thr Ser Ser Thr Leu Thr Gin Arg Val Leu 
120 125 130 

CAG ACT CTG CGC GCG CCG TCG GCG ACT ACT CGA AGA TCC CTG ACG GCC 726 
Gin Ser Leu Arg Ala Pro Ser Ala Ser Thr Arg Arg Ser Leu Thr Ala 
135 lAO 145 



TCG TCA ACT TCT CCG TCG ACG CAG AGA TAAGAGAGAT CTATAACGAG 773 
Ser Ser Ser Ser Pro Ser Thr Gin Arg 



150 




155 








GAGTGCCCCG 


TCGTCACTGA 


CGTGTCCGTC 


CCCCTCGACG 


GCCGCCAGTG GAGCCTCTCG 


833 


ATTTTCTCCT 


TTCCGATGTT 


CAGAACCGCC 


TACGTCGCCG 


TAGCGAACGT CGAGAACAAG 


893 


GAGATGTCGC 


TCGACGTTGT 


CAACGACCTC 


ATCGAGTGGC 


TCAACAATCT CGCCGACTGG 


953 


CGTTATGTCG 


TTGACTCTGA 


ACAGTGGATT 


AACTTCACCA 


ATGACACCAC GTACTACGTC 


1013 


CGCATCCGCG 


TTCTACGTCC 


AACCTACGAC 


GTTCCAGACC 


CCACAGAGGG CCTTGTTCGC 


1073 


ACAGTCTCAG 


ACTACCGCCT 


CACTTATAAG 


GCGATAACAT 


GTGAAGCCAA CATGCCAACA 


1133 


CTCGTCGACC 


AAGGCTTTTG 


GATCGGCGGC 


CAGTACGCTC 


TCACCCCGAC TAGCCTACCG 


1193 


CAGTACGACG 


TCAGCGAGGC 


CTACGCTCTG 


CACACTTTGA 


CCTTCGCCAG ACCATCCAGC 


1253 


GCCGCTGCAC 


TCGCGTTTGT 


GTGGGCAGGT 


TTGCCACAGG 


GTGGCACTGC GCCTGCAGGC 


1313 


ACTCCAGCCT 


GGGAGCAGCC 


ATCCTCGGGT 


GGCTACCTCA 


CCTGGCGCCA CAACGGTACT 


1373 


ACTTTCCCAG 


CTGGCTCCGT 


TAGCTACGTT 


CTCCCTGAGG 


GTTTCGCCCT TGAGCGCTAC 


1433 


GACCCGAACG 


ACGGCTCTTG 


GACCGACTTC 


GCTTCCGCAG 


GAGACACCGT CACTTTCCGG 


1493 


CAGGTCGCCG 


TCGACGAGGT 


CGTTGTGACC 


AACAACCCCG 


CCGGCGGCGG CAGCGCCCCC 


1553 


ACCTTCACCC 


TGAGAGTGCC 


CCCTTCAAAC 


GCTTACACCA 


ACACCGTGTT TAGGAACACG 


1613 


CTCTTAGAGA 


CTCGACCCTC 


CTCTCGTAGG 


CTCGAACTCC 


CTATGCCACC TGCTGACTTT 


1673 


GGACAGACGG 


TCGCCAACAA 


CCCGAAGATC 


GAGCAGTCGC 


TTCTTAAAGA AACACTTGGC 


1733 


TGCTATTTGG 


TCCACTCCAA 


AATGCGAAAC 


CCCGTTTTCC 


AGCTCACGCC AGCCAGCTCC 


1793 


TTTGGCGCCG 


TTTCCTTCAA 


CAATCCGGGT 


TATGAGCGCA 


CACGCGACCT CCCGGACTAC 


1853 


ACTGGCATCC 


GTGACTCATT 


CGACCAGAAC 


ATGTCCACCG 


CTGTGGCCCA CTTCCGCTCA 


1913 


CTCTCCCACT 


CCTGCAGTAT 


CGTCACTAAG 


ACCTACCAGG 


GTTGGGAAGG CGTCACGAAC 


1973 


GTCAACACGC 


CTTTCGGCCA 


ATTCGCGCAC 


GCGGGCCTCC 


TCAAGAATGA GGAGATCCTC 


2033 


TGCCTCGCCG 


ACGACCTGGC 


CACCCGTCTC 


ACAGGTGTCT 


ACCCCGCCAC TGACAACTTC 


2093 


GCGGCCGCCG 


TTTCTGCCTT 


CGCCGCGAAC 


ATGCTGTCCT 


CCGTGCTGAA GTCGGAGGCA 


2153 


ACGTCCTCCA 


TCATCAACTC 


CGTTGGCGAG 


ACTGCCGTCG 


GCGCGGCTCA GTCCGGCCTC 


2213 
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GCGAAGCTAC CCGGACTGCT AATGAGTGTA CCAGGGAAGA TTGCCGCGCG TGTCCGCGCG 
CGCCGAGCGC GCCGCCGCGC CGCTCGTGCC AATTAGTTTG CTCGCTCCTG TTTCGCCGTT 
TCGTAAAACG GCGTGGTCCC GCACATTACG CGTACCCTAA AGACTCTGGT GAGTCCCCGT 
CGTTACACGA CGGGTCTGCC GCGGTTCGAT TCCATTCCCA AGCGGCAAGA AGGACGTAGT 
TAGCTCTGCG TCCCTCGGGA TACCA 



(2) INFORMATION FOR SEQ ID N0:48: 

(i) SEOUENGE CHARACTERISTICS: 

(A) LENGTH: 157 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:A8: 



Met Ser Glu His Thr He Ala His Ser He Thr Leu Pro Pro Gly Tyr 
15 10 15 

Thr Leu Ala Leu He Pro Pro Glu Pro Glu Ala Gly Trp Glu Met Leu 
20 25 30 

Glu Trp Are His Ser Asp Leu Thr Thr Val Ala Glu Pro Val Thr Phe 
35 AO 45 

Gly Ser Ala Pro Thr Pro Ser Pro Ser Met Val Glu Glu Thr Asn Gly 
50 55 60 

Val Gly Pro Glu Gly Lys Phe Leu Pro Leu Thr He Ser Pro Leu Leu 
65 70 75 80 

His Lys Thr Ser Arg Lys Ala Leu Thr Pro Thr Pro Ser Leu Ser Pro 
85 90 95 

Leu Thr Ser Leu Ala Cys Pro Asn Ser Gly He Gly Pro Arg Glu Arg 
100 105 110 

Ser Thr Ser Thr Pro He Pro Ser Ala Gly Thr Ser Ser Thr Leu Thr 
115 120 125 

Gin Arg Val Leu Gin Ser Leu Arg Ala Pro Ser Ala Ser Thr Arg Arg 
130 135 140 

Ser Leu Thr Ala Ser Ser Ser Ser Pro Ser Thr Gin Arg 
145 150 155 



2273 
2333 
2393 
2453 
2478 
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(2) INFORMATION FOR SEQ ID N0:A9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2478 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 366. .2306 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:A9: 



GTTTTTCTTT CTTTACCAAG TGTGGTAAAA TTTAAACAAA GAAGAAAACC AGGACCGTAA 60 

CCCGGCCCTT ACACACCTCG AGTCCGTGAC CACCGGATTA TACGTCGCCC ACCACACGGC 120 

GCCTTTTCCG ACCACTCTCG AGAGTCGTTG GGAGTTTCGT CCGTGACCAC CCGGTTGGCA 180 

GTCGACAGAC GCTTCCGGAC CACTAGAACC TCCTCGAGCG ACGCACACAC AGCACACACA 240 

CCGCCTTAGC TGCACCTACG GCAGCGTTGA TAGCGCGGAT TTATGAGCGA GCACACCATC 300 

GCCCACTCCA TCACATTACC ACCCGGTTAC ACCCTTGCCC TAATACCCCC TGAACCTGAA 360 

GCAGG ATG GGA GAT GCT GGA GTG GCG TCA CAG CGA CCT CAC AAC CGT 407 
Met Gly Asp Ala Gly Val Ala Ser Gin Arg Pro His Asn Arg 
15 10 

CGC GGA ACC CGT AAC GTT CGG GTC AGC GCC AAC ACC GTC ACC GTC AAT 455 
Arg Gly Thr Arg Asn Val Arg Val Ser Ala Asn Thr Val Thr Val Asn 
15 20 25 30 

GGT AGA AGA AAC CAA CGG CGT CGG ACC GGA AGG CAA GTT TCT CCC CCT 503 
Gly Arg Arg Asn Gin Arg Arg Arg Thr Gly Arg Gin Val Ser Pro Pro 
35 40 45 

GAG AAT TTC ACC GCT GCT GGA CAA GAG GTC GCG CAA AGC CTT GAC GCC 551 
Asp Asn Phe Thr Ala Ala Ala Gin Asp Leu Ala Gin Ser Leu Asp Ala 
50 55 60 

AAC ACC GTC ACT TTC CCC GCT AAC ATC TCT AGC ATG CCC GAA TTC CGG 599 
Asn Thr Val Thr Phe Pro Ala Asn lie Ser Ser Met Pro Glu Phe Arg 
65 70 75 

AAT TGG GCC AAG GGA AAG ATC GAC GTC GAC TCC GAT TCC ATC GGC TGG 647 
Asn Trp Ala Lys Gly Lys lie Asp Leu Asp Ser Asp Ser lie Gly Trp 
80 85 90 

TAG TTC AAG TAG CTT GAC CCA GCG GGT GCT ACA GAG TCT GCG CGC GCC 695 
Tyr Phe Lys Tyr Leu Asp Pro Ala Gly Ala Thr Glu Ser Ala Arg Ala 
95 100 105 110 

GTC GGC GAG TAG TCG AAG ATC CCT GAC GGC GTC GTC AAG TTC TCC GTC 743 
Val Gly Glu Tyr Ser Lys lie Pro Asp Gly Leu Val Lys Phe Ser Val 
115 120 125 
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GAC GCA GAG ATA AGA GAG ATC TAT AAC GAG GAG TGC CCC GTC GTC ACT 791 
Asp Ala Glu He Arg Glu He Tyr Asn Glu Glu Cys Pro Val Val Thr 
130 135 lAO 

GAC GTG TCC GTC CCC CTC GAC GGC CGC CAG TGG AGC CTC TCG ATT TTC 839 
Asp Val Ser Val Pro Leu Asp Gly Arg Gin Trp Ser Leu Ser He Phe 
145 150 155 

TCC TTT CCG ATG TTC AGA ACC GCC TAG GTC GCC GTA GCG AAC GTC GAG 887 
Ser Phe Pro Met Phe Arg Thr Ala Tyr Val Ala Val Ala Asn Val Glu 
160 165 170 

AAC AAC GAG ATG TCG CTC GAC GTT GTC AAC GAC CTC ATC GAG TGG CTC 935 
Asn Lys Glu Met Ser Leu Asp Val Val Asn Asp Leu He Glu Trp Leu 
175 180 185 190 

AAC AAT CTC GCC GAC TGG CGT TAT GTC GTT GAC TCT GAA CAG TGG ATT 983 
Asn Asn Leu Ala Asp Trp Arg Tyr Val Val Asp Ser Glu Gin Trp He 
195 200 205 

AAC TTC ACC AAT GAC ACC ACG TAG TAG GTC CGC ATC CGC GTT CTA CGT 1031 
Asn Phe Thr Asn Asp Thr Thr Tyr Tyr Val Arg He Arg Val Leu Arg 
210 215 220 

CCA ACC TAG GAC GTT CCA GAC CCC AGA GAG GGC GTT GTT CGC ACA GTC 1079 
Pro Thr Tyr Asp Val Pro Asp Pro Thr Glu Gly Leu Val Arg Thr Val 
225 230 235 

TCA GAC TAG CGC CTC ACT TAT AAG GCG ATA ACA TGT GAA GCC AAC ATG 1127 
Ser Asp Tyr Arg Leu Thr Tyr Lys Ala He Thr Cys Glu Ala Asn Met 
240 245 250 

CCA ACA CTC GTC GAC GAA GGC TTT TGG ATC GGC GGC CAG TAG GCT CTC 1175 
Pro Thr Leu Val Asp Gin Gly Phe Trp He Gly Gly Gin Tyr Ala Leu 
255 260 265 270 

ACC CCG ACT AGC CTA CCG CAG TAG GAC GTC AGC GAG GCC TAG GCT GTG 1223 
Thr Pro Thr Ser Leu Pro Gin Tyr Asp Val Ser Glu Ala Tyr Ala Leu 
275 280 285 

CAG ACT TTG ACC TTC GCC AGA CCA TCC AGC GCC GCT GCA CTC GCG TTT 1271 
His Thr Leu Thr Phe Ala Arg Pro Ser Ser Ala Ala Ala Leu Ala Phe 
290 295 300 

GTG TGG GCA GGT TTG CCA CAG GGT GGC ACT GCG GCT GCA GGC ACT CCA 1319 
Val Trp Ala Gly Leu Pro Gin Gly Gly Thr Ala Pro Ala Gly Thr Pro 
^ 305 310 315 

GCC TGG GAG CAG GCA TCC TCG GGT GGC TAG CTC ACC TGG CGC GAG AAC 1367 
Ala Trp Glu Gin Ala Ser Ser Gly Gly Tyr Leu Thr Trp Arg His Asn 
320 325 330 

GGT ACT ACT TTC CCA GCT GGC TCC GTT AGC TAG GTT CTC GCT GAG GGT 1415 
Gly Thr Thr Phe Pro Ala Gly Ser Val Ser Tyr Val Leu Pro Glu Gly 
335 340 345 350 

TTC GCC CTT GAG GGC TAG GAC CCG AAC GAC GGC TCT TGG ACC GAC TTC 1463 
Phe Ala Leu Glu Arg Tyr Asp Pro Asn Asp Gly Ser Trp Thr Asp Phe 
355 360 365 
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GCT TCC GCA GGA GAG ACC GTC ACT TTC CGG GAG GTC GCC GTC GAG GAG 1511 
Ala Ser Ala Gly Asp Thr Val Thr Phe Arg Gin Val Ala Val Asp Glu 
370 375 380 

GTC GTT GTG ACC AAC AAC GCC GCC GGC GGC GGC AGC GCC CCC ACC TTC 1559 
Val Val Val Thr Asn Asn Pro Ala Gly Gly Gly Ser Ala Pro Thr Phe 
385 390 395 

ACC GTG AGA GTG CCC CCT TCA AAC GCT TAG ACC AAC ACC GTG TTT AGG 1607 
Thr Val Arg Val Pro Pro Ser Asn Ala Tyr Thr Asn Thr Val Phe Arg 
400 405 AlO 

AAC ACG CTC TTA GAG ACT GGA CCC TCC TCT CGT AGG CTC GAA CTC CCT 1655 
Asn Thr Leu Leu Glu Thr Arg Pro Ser Ser Arg Arg Leu Glu Leu Pro 
415 A20 425 430 

ATG CCA CCT GCT GAC TTT GGA CAG ACG GTC GCC AAC AAC CCG AAG ATC 1703 
Met Pro Pro Ala Asp Phe Gly Gin Thr Val Ala Asn Asn Pro Lys lie 
435 440 445 

GAG CAG TCG CTT CTT AAA GAA ACA CTT GGC TGC TAT TTG GTC CAC TCC 1751 
Glu Gin Ser Leu Leu Lys Glu Thr Leu Gly Cys Tyr Leu Val His Ser 
450 455 460 

AAA ATG CGA AAC CCC GTT TTC CAG CTC ACG CCA GCC AGC TCC TTT GGC 1799 
Lys Met Arg Asn Pro Val Phe Gin Leu Thr Pro Ala Ser Ser Phe Gly 
465 470 475 

GCC GTT TCC TTC AAC AAT CCG GCT TAT GAG CGC ACA CGC GAC CTC CCG 1847 
Ala Val Ser Phe Asn Asn Pro Gly Tyr Glu Arg Thr Arg Asp Leu Pro 
480 485 490 

GAC TAC ACT GGC ATC CGT GAC TCA TTC GAC CAG AAC ATG TCC ACC GCT 1895 
Asp Tyr Thr Gly He Arg Asp Ser Phe Asp Gin Asn Met Ser Thr Ala 
495 500 505 510 

GTG GCC CAC TTC CGC TCA CTC TCC CAC TCC TGC ACT ATC GTC ACT AAG 1943 
Val Ala His Phe Arg Ser Leu Ser His Ser Cys Ser He Val Thr Lys 
515 520 525 

ACC TAC CAG GGT TGG GAA GGC GTC ACG AAC GTC AAC ACG CCT . TTC GGC 1991 
Thr Tyr Gin Gly Trp Glu Gly Val Thr Asn Val Asn Thr Pro Phe Gly 
530 535 540 

CAA TTC GCG CAC CCG CGC CTC CTC AAG AAT GAG GAG ATC CTC TGC CTC 2039 
Gin Phe Ala His Ala Gly Leu Leu Lys Asn Glu Glu He Leu Cys Leu 
545 550 555 

GCC GAC GAC GTG GCC ACC CGT CTC ACA GGT GTC TAC CCC GCC ACT GAC 2087 
Ala Asp Asp Leu Ala Thr Arg Leu Thr Gly Val Tyr Pro Ala Thr Asp 
560 565 570 

AAC TTC GCG GCC GCC GTT TCT GCC TTC GCC GCG AAC ATG CTG TCC TCC 2135 
Asn Phe Ala Ala Ala Val Ser Ala Phe Ala Ala Asn Met Leu Ser Ser 
575 580 585 590 

GTG CTG AAG TCG GAG GCA ACG TCC TCC ATC ATC AAG TCC GTT GGC GAG 2183 
Val Leu Lys Ser Glu Ala Thr Ser Ser He He Lys Ser Val Gly Glu 
595 600 605 
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ACT GCC GTC GGC GCG GOT GAG TCC GGC CTC GCG AAG CTA CCC GGA CTG 2231 
Thr Ala Val Gly Ala Ala Gin Ser Gly Leu Ala Lys Leu Pro Gly Leu 
610 615 620 

CTA ATG ACT GTA CCA GGG AAG ATT GCC GCG CGT GTC CGC GCG GGC CGA 2279 
Leu Met Ser Val Pro Gly Lys He Ala Ala Arg Val Arg Ala Arg Arg 
625 630 635 

GCG CGC CGC CGC GCC GCT CGT GCC AAT TAGTTTGCTC GCTCCTGTTT 2326 
Ala Arg Arg Arg Ala Ala Arg Ala Asn 
640 645 

CGCCGTTTCG TAAAACGGCG TGGTCCCGCA CATTACGCGT ACCCTAAAGA CTCTGGTGAG 2386 

TCCCCGTCGT TACACGACGG GTCTGCCGCG GTTCGATTCC ATTCCCAAGC GGCAAGAAGG 2446 

ACGTAGTTAG CTCTGCGTCC CTCGGGATAC CA 2478 



(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 647 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:50: 

Met Gly Asp Ala Gly Val Ala Ser Gin Arg Pro His Asn Arg Arg Gly 
15 10 15 

Thr Arg Asn Val Arg Val Ser Ala Asn Thr Val Thr Val Asn Gly Arg 
20 25 30 

Arg Asn Gin Arg Arg Arg Thr Gly Arg Gin Val Ser Pro Pro Asp Asn 
35 40 45 

Phe Thr Ala Ala Ala Gin Asp Leu Ala Gin Ser Leu Asp Ala Asn Thr 
50 55 60 

Val Thr Phe Pro Ala Asn He Ser Ser Met Pro Glu Phe Arg Asn Trp 
65 70 75 80 

Ala Lys Gly Lys He Asp Leu Asp Ser Asp Ser He Gly Trp Tyr Phe 
^ ^ 85 90 95 

Lys Tyr Leu Asp Pro Ala Gly Ala Thr Glu Ser Ala Arg Ala Val Gly 
100 105 110 

Glu Tyr Ser Lys He Pro Asp Gly Leu Val Lys Phe Ser Val Asp Ala 
115 120 125 

Glu He Arg Glu He Tyr Asn Glu Glu Cys Pro Val Val Thr Asp Val 
130 135 140 

Ser Val Pro Leu Asp Gly Arg Gin Trp Ser Leu Ser He Phe Ser Phe 
145 150 155 160 
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Pro Met Phe Arg Thr Ala Tyr Val Ala Val Ala Asn Val Glu Asn Lys 
165 170 175 

Glu Met Ser Leu Asp Val Val Asn Asp Leu He Glu Trp Leu Asn Asn 
180 185 190 

Leu Ala Asp Trp Arg Tyr Val Val Asp Ser Glu Gin Trp He Asn Phe 
195 200 205 

Thr Asn Asp Thr Thr Tyr Tyr Val Arg He Arg Val Leu Arg Pro Thr 
210 215 220 

Tyr Asp Val Pro Asp Pro Thr Glu Gly Leu Val Arg Thr Val Ser Asp 
225 230 235 240 

Tyr Arg Leu Thr Tyr Lys Ala He Thr Cys Glu Ala Asn Met Pro Thr 
245 250 255 

Leu Val Asp Gin Gly Phe Trp He Gly Gly Gin Tyr Ala Leu Thr Pro 
260 265 270 

Thr Ser Leu Pro Gin Tyr Asp Val Ser Glu Ala Tyr Ala Leu His Thr 
275 280 285 

Leu Thr Phe Ala Arg Pro Ser Ser Ala Ala Ala Leu Ala Phe Val Trp 
290 295 300 

Ala Gly Leu Pro Gin Gly Gly Thr Ala Pro Ala Gly Thr Pro Ala Trp 
305 310 315 320 

Glu Gin Ala Ser Ser Gly Gly Tyr Leu Thr Trp Arg His Asn Gly Thr 
325 330 335 

Thr Phe Pro Ala Gly Ser Val Ser Tyr Val Leu Pro Glu Gly Phe Ala 
340 345 350 

Leu Glu Arg Tyr Asp Pro Asn Asp Gly Ser Trp Thr Asp Phe Ala Ser 
355 360 365 

Ala Gly Asp Thr Val Thr Phe Arg Gin Val Ala Val Asp Glu Val Val 
370 375 380 

Val Thr Asn Asn Pro Ala Gly Gly Gly Ser Ala Pro Thr Phe Thr Val 
385 390 395 400 

Arg Val Pro Pro Ser Asn Ala Tyr Thr Asn Thr Val Phe Arg Asn Thr 
405 410 415 

Leu Leu Glu Thr Arg Pro Ser Ser Arg Arg Leu Glu Leu Pro Met Pro 
420 425 430 

Pro Ala Asp Phe Gly Gin Thr Val Ala Asn Asn Pro Lys He Glu Gin 
435 440 445 

Ser Leu Leu Lys Glu Thr Leu Gly Cys Tyr Leu Val His Ser Lys Met 
450 455 460 

Arg Asn Pro Val Phe Gin Leu Thr Pro Ala Ser Ser Phe Gly Ala Val 
465 470 475 480 
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Ser Phe Asn Asn Pro Gly Tyr Glu Arg Thr Arg Asp Leu Pro Asp Tyr 
485 490 495 

Thr Gly He Arg Asp Ser Phe Asp Gin Asn Met Ser Thr Ala Val Ala 
500 505 510 

His Phe Arg Ser Leu Ser His Ser Cys Ser He Val Thr Lys Thr Tyr 
515 520 525 

Gin Gly Trp Glu Gly Val Thr Asn Val Asn Thr Pro Phe Gly Gin Phe 
530 535 540 

Ala His Ala Gly Leu Leu Lys Asn Glu Glu He Leu Cys Leu Ala Asp 
545 550 555 565 

Asp Leu Ala Thr Arg Leu Thr Gly Val Tyr Pro Ala Thr Asp Asn Phe 
^ 565 570 575 

Ala Ala Ala Val Ser Ala Phe Ala Ala Asn Met Leu Ser Ser Val Leu 
580 585 590 

Lys Ser Glu Ala Thr Ser Ser He He Lys Ser Val Gly Glu Thr Ala 
595 600 605 

Val Gly Ala Ala Gin Ser Gly Leu Ala Lys Leu Pro Gly Leu Leu Met 
610 615 620 

Ser Val Pro Gly Lys He Ala Ala Arg Val Arg Ala Arg Arg Ala Arg 
625 630 635 640 

Arg Arg Ala Ala Arg Ala Asn 
645 



(2) INFORMATION FOR SEQ ID N0:51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2479 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 283.. 2307 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:51: 

GTTTTTCTTT CTTTACCAAG TGTGGTAAAA TTTAAACAAA GAAGAAAACC AGGACCGTAA 60 

CCCGGCCCTT ACACACCTCG AGTCCGTGAC CACCGGATTA TACGTCGCCC ACCACACGGC 120 

GCCTTTTCCG ACCACTCTCG AGAGTCGTTG GGAGTTTCGT CCGTGACCAC CCGGTTGGCA 180 

GTCGACAGAC GCTTCCGGAC CACTAGAACC TCCTCGAGCG ACGCACACAC AGCACACACA 240 
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CCGCCTTAGC TGCACCTACG GCAGCGTTGA TAGCGCGGAT TT ATG AGC GAG CAC 294 

Met Ser GIu His 
1 

ACC ATC GCC CAC TCC ATC ACA TTA CCA CCC GGT TAG ACC CTT GCC CTA 342 
Thr He Ala His Ser He Thr Leu Pro Pro Gly Tyr Thr Leu Ala Leu 
5 10 15 20 

ATA CCC CCT GAA CCT GAA GCA GGA TGG GAG ATG CTG GAG TGG CGT CAC 390 
He Pro Pro Glu Pro Glu Ala Gly Trp Glu Met Leu Glu Trp Arg Hxs 
25 30 35 

AGC GAC CTC ACA ACC GTC GCG GAA CCC GTA ACG TTC GGG TCA GCG CCA 438 
Ser Asp Leu Thr Thr Val Ala Glu Pro Val Thr Phe Gly Ser Ala Pro 
40 45 50 

ACA CCG TCA CCG TCA ATG GTA GAA GAA ACC AAC GGC GTC GGA CCG GAA 486 
Thr Pro Ser Pro Ser Met Val Glu Glu Thr Asn Gly Val Gly Pro Glu 
55 60 65 

GGC AAC TTT CTC CCC CTG ACA ATT TCA CCG CTG CTG CAC AAG ACC TCC 534 
Gly Lys Phe Leu Pro Leu Thr He Ser Pro Leu Leu His Lys Thr Ser 
70 75 80 

CGC AAA GCC TTG ACG CCA ACA CCG TCA CTT TCC CCC GCT AAC ATC TCT 582 
Are Lys Ala Leu Thr Pro Thr Pro Ser Leu Ser Pro Ala Asn He Ser 
85 90 95 100 

AGC ATG CCC GAA TTC CCG AAT TGG GCC AAG GGA AAG ATC GAC CTC GAC 630 
Ser Met Pro Glu Phe Arg Asn Trp Ala Lys Gly Lys He Asp Leu Asp 
105 110 115 

TCC GAT TCC ATC GGC TGG TAG TTC AAG TAG CTT GAC CCA GCG GGT GCT 678 
Ser Asp Ser He Gly Trp Tyr Phe Lys Tyr Leu Asp Pro Ala Gly Ala 
^ 120 125 130 

ACA GAG TCT GCG CGC GCC GTC GGC GAG TAG TGG AAG ATC CCT GAC GGC 726 
Thr Glu Ser Ala Arg Ala Val Gly Glu Tyr Ser Lys He Pro Asp Gly 
135 140 145 

CTC GTC AAG TTC TCC GTC GAC GCA GAG ATA AGA GAG ATC TAT AAC GAG 774 
Leu Val Lys Phe Ser Val Asp Ala Glu He Arg Glu He Tyr Asn Glu 
150 155 160 

GAG TGC CCC GTC GTC ACT GAC GTG TCC GTC CCC CTC GAC GGC CGC GAG 822 
Glu Cys Pro Val Val Thr Asp Val Ser Val Pro Leu Asp Gly Arg Gin 
165 170 175 180 

TGG AGC CTC TGG ATT TTC TCC TTT CCG ATG TTC AGA ACC GCC TAG GTC 870 
Trp Ser Leu Ser He Phe Ser Phe Pro Met Phe Arg Thr Ala Tyr Val 
185 190 195 

GCC GTA GCG AAC GTC GAG AAC AAG GAG ATG TGG CTC GAC CTT GTC AAC 918 
Ala Val Ala Asn Val Glu Asn Lys Glu Met Ser Leu Asp Val Val Asn 
200 205 210 

GAC CTC ATC GAG TGG CTC AAC AAT CTC GCC GAC TGG CGT TAT GTC GTT 966 
Asp Leu He Glu Trp Leu Asn Asn Leu Ala Asp Trp Arg Tyr Val Val 
215 220 225 
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GAC TCT GAA GAG TGG ATT AAC TTC ACC AAT GAG ACG AGG TAG TAG GTG 1014 
Asp Ser Glu Gin Trp He Asn Phe Thr Asn Asp Thr Thr Tyr Tyr Val 
230 235 240 

GGG ATG GGC GTT GTA GGT GGA AGG TAG GAG GTT GGA GAG GGG AGA GAG 1062 
Arc He Arc Val Leu Arg Pro Thr Tyr Asp Val Pro Asp Pro Thr Glu 
245 250 255 260 

GGG GTT GTT GGG AGA GTG TGA GAG TAG GGG GTG AGT TAT AAG GGG ATA 1110 
Gly Leu Val Arg Thr Val Ser Asp Tyr Arg Leu Thr Tyr Lys Ala He 
265 270 275 

AGA TGT GAA GGG AAG ATG GGA AGA GTG GTG GAG GAA GGC TTT TGG ATC 1158 
Thr Gys Glu Ala Asn Met Pro Thr Leu Val Asp Gin Gly Phe Trp He 
280 285 290 

GGG GGG GAG TAG GGT GTG ACG GGG ACT AGG GTA GGG CAG TAG GAC GTG 1206 
Gly Gly Gin Tyr Ala Leu Thr Pro Thr Ser Leu Pro Gin Tyr Asp Val 
^ 295 300 305 

AGG GAG GGG TAG GGT GTG GAC ACT TTG ACG TTC GGG AGA CCA TGG AGG 1254 
Ser Glu Ala Tyr Ala Leu His Thr Leu Thr Phe Ala Arg Pro Ser Ser 
310 315 320 

GCC GCT GGA GTG GGG TTT GTG TGG GGA GGT TTG GGA GAG GGT GGC ACT 1302 
Ala Ala Ala Leu Ala Phe Val Trp Ala Gly Leu Pro Gin Gly Gly Thr 
325 330 335 340 

GGG GCT GGA GGG AGT GGA GGG TGG GAG GAG GGA TGG TGG GGT GGC TAG 1350 
Ala Pro Ala Gly Thr Pro Ala Trp Glu Gin Ala Ser Ser Gly Gly Tyr 
345 350 355 

GTG ACC TGG GGG GAG AAC GGT AGT AGT TTG CCA GCT GGG TGG GTT AGG 1398 
Leu Thr Trp Arg His Asn Gly Thr Thr Phe Pro Ala Gly Ser Val Ser 
360 365 370 

TAG GTT CTC GCT GAG GGT TTG GGC GTT GAG GGC TAG GAC GGG AAG GAG 1446 
Tyr Val Leu Pro Glu Gly Phe Ala Leu Glu Arg Tyr Asp Pro Asn Asp 
375 380 385 

GGC TCT TGG ACC GAC TTC GCT TGG GGA GGA GAG AGG GTC AGT TTG GGG 1494 
Gly Ser Trp Thr Asp Phe Ala Ser Ala Gly Asp Thr Val Thr Phe Arg 
390 395 400 

GAG GTC GGG GTC GAC GAG GTC GTT GTG AGG AAG AAG GGC GGC GGC GGC 1542 
Gin Val Ala Val Asp Glu Val Val Val Thr Asn Asn Pro Ala Gly Gly 
405 410 415 420 

GGG AGG GCC GCC ACC TTG ACC GTG AGA GTG GCC GCT TGA AAG GCT TAG 1590 
Gly Ser Ala Pro Thr Phe Thr Val Arg Val Pro Pro Ser Asn Ala Tyr 
425 430 435 

ACC AAC ACC GTG TTT AGG AAG ACG CTC TTA GAG AGT GGA GCC TCC TGT 1638 
Thr Asn Thr Val Phe Arg Asn Thr Leu Leu Glu Thr Arg Pro Ser Ser 
440 445 450 

CGT AGG CTC GAA CTC GCT ATG CCA GCT GCT GAC TTT GGA GAG ACG GTG 1686 
Arg Are Leu Glu Leu Pro Met Pro Pro Ala Asp Phe Gly Gin Thr Val 
455 460 465 
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GCC AAC AAC CCG AAG ATC GAG CAG TCG CTT CtT AAA GAA ACA CTT GGC 1734 
Ala Asn Asn Pro Lys lie Glu Gin Ser Leu Leu Lys Glu Thr Leu Gly 
470 475 480 

TGC TAT TTG GTC CAC TCC AAA ATG CGA AAC CCC GTT TTC CAG CTC ACG 1782 
Cys Tyr Leu Val His Ser Lys Met Arg Asn Pro Val Phe Gin Leu Thr 
485 490 495 500 

CCA GCC AGC TCC TTT GGC GCC GTT TCC TTC AAC AAT CCG GGT TAT GAG 1830 
Pro Ala Ser Ser Phe Gly Ala Val Ser Phe Asn Asn Pro Gly Tyr Glu 
505 510 515 

CGC ACA GGC GAC CTC CCG GAC TAG ACT GGC ATC CGT GAC TCA TTC GAC 1878 
Arg Thr Arg Asp Leu Pro Asp Tyr Thr Gly He Arg Asp Ser Phe Asp 
520 525 530 

CAG AAC ATG TCC ACC GGT GTG GCC CAC TTC CGC TCA CTC TCC CAC TCC 1926 
Gin Asn Met Ser Thr Ala Val Ala His Phe Arg Ser Leu Ser His Ser 
535 540 545 

TGC ACT ATC GTC ACT AAG ACC TAG CAG GGT TGG GAA GGC GTC ACG AAC 1974 
Cys Ser He Val Thr Lys Thr Tyr Gin Gly Trp Glu Gly Val Thr Asn 
^ 550 555 560 

GTC AAC ACG CGT TTC GGC GAA TTC GCG CAC GCG GGC CTC CTC AAG AAT 2022 
Val Asn Thr Pro Phe Gly Gin Phe Ala His Ala Gly Leu Leu Lys Asn 
565 570 575 580 

GAG GAG ATC CTC TGC CTC GCC GAC GAC CTG GCC ACC CGT CTC ACA GGT 2070 
Glu Glu He Leu Cys Leu Ala Asp Asp Leu Ala Thr Arg Leu Thr Gly 
585 590 595 

GTC TAG CCC GCC ACT GAC AAC TTC GCG GCC GCC GTT TCT GCC TTC GCC 2118 
Val Tyr Pro Ala Thr Asp Asn Phe Ala Ala Ala Val Ser Ala Phe Ala 
600 605 610 

GCG AAC ATG CTG TCC TCC GTG CTG AAG TCG GAG CCA ACG TCC TCC ATC 2166 
Ala Asn Met Leu Ser Ser Val Leu Lys Ser Glu Ala Thr Ser Ser He 
615 620 625 

ATC AAG TCC GTT GGC GAG ACT GCC GTC GGC GCG GCT CAG TCC GGC CTC 2214 
He Lys Ser Val Gly Glu Thr Ala Val Gly Ala Ala Gin Ser Gly Leu 
630 635 640 

GCG AAG CTA CCC GGA CTG CTA ATG ACT GTA CCA GGG AAG ATT GCC GCG 2262 
Ala Lys Leu Pro Gly Leu Leu Met Ser Val Pro Gly Lys He Ala Ala 
645 650 655 660 

CGT GTC CGC GCG CGC CGA GCG CGC CGC CGC GCC GCT CGT GCC AAT 2307 
Arg Val Arg Ala Arg Arg Ala Arg Arg Arg Ala Ala Arg Ala Asn 
665 670 675 

TAGTTTGCTC GCTCCTGTTT CGCCGTTTCG TAAAACGGCG TGGTCCCGCA CATTACGCGT 2367 

ACCCTAAAGA CTCTGGTGAG TCCCCGTCGT TACACGACGG GTCTGCCGCG GTTCGATTCC 2427 

ATTCCCAAGC GGCAAGAAGG ACGTAGTTAG CTCTGCGTCC CTCGGGATAC CA 2479 
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(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 675 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 

Met Ser Glu His Thr He Ala His Ser He Thr Leu Pro Pro Gly Tyr 
15 10 15 

Thr Leu Ala Leu He Pro Pro Glu Pro Glu Ala Gly Trp Glu Met Leu 
20 25 30 

Glu Trp Are His Ser Asp Leu Thr Thr Val Ala Glu Pro Val Thr Phe 
35 AO 45 

Gly Ser Ala Pro Thr Pro Ser Pro Ser Met Val Glu Glu Thr Asn Gly 
50 55 60 

Val Gly Pro Glu Gly Lys Phe Leu Pro Leu Thr He Ser Pro Leu Leu 
65 70 75 80 

His Lys Thr Ser Are Lys Ala Leu Thr Pro Thr Pro Ser Leu Ser Pro 
85 90 95 

Ala Asn He Ser Ser Met Pro Glu Phe Arg Asn Trp Ala Lys Gly Lys 
100 105 110 

He Asp Leu Asp Ser Asp Ser He Gly Trp Tyr Phe Lys Tyr Leu Asp 
115 120 125 

Pro Ala Gly Ala Thr Glu Ser Ala Arg Ala Val Gly Glu Tyr Ser Lys 
130 135 lAO 

He Pro Asp Gly Leu Val Lys Phe Ser Val Asp Ala Glu He Arg Glu 
145 150 155 160 

He Tyr Asn Glu Glu Cys Pro Val Val Thr Asp Val Ser Val Pro Leu 
^ 165 170 175 

Asp Gly Arg Gin Trp Ser Leu Ser He Phe Ser Phe Pro Met Phe Arg 
^ ^ 180 185 190 

Thr Ala Tyr Val Ala Val Ala Asn Val Glu Asn Lys Glu Met Ser Leu 
195 200 205 

Asp Val Val Asn Asp Leu He Glu Trp Leu Asn Asn Leu Ala Asp Trp 
210 215 220 

Are Tyr Val Val Asp Ser Glu Gin Trp He Asn Phe Thr Asn Asp Thr 
225 230 235 240 

Thr Tyr Tyr Val Arg He Arg Val Leu Arg Pro Thr Tyr Asp Val Pro 
245 250 255 

Asp Pro Thr Glu Gly Leu Val Arg Thr Val Ser Asp Tyr Arg Leu Thr 
^ 260 265 270 
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Tvr Lys Ala He Thr Cys Glu Ala Asn Met Pro Thr Leu Val Asp Gin 
^115 280 285 

Gly Phe Trp He Gly Gly Gin Tyr Ala Leu Thr Pro Thr Ser Leu Pro 
290 295 300 

Gin Tyr Asp Val Ser Glu Ala Tyr Ala Leu His Thr Leu Thr Phe Ala 
305 310 315 320 

Ara Pro Ser Ser Ala Ala Ala Leu Ala Phe Val Trp Ala Gly Leu Pro 
^ 325 330 335 

Gin Gly Gly Thr Ala Pro Ala Gly Thr Pro Ala Trp Glu Gin Ala Ser 
^ ^ 340 345 350 

Ser Gly Gly Tyr Leu Thr Trp Arg His Asn Gly Thr Thr Phe Pro Ala 
355 360 365 

Gly Ser Val Ser Tyr Val Leu Pro Glu Gly Phe Ala Leu Glu Arg Tyr 
370 375 380 

Asp Pro Asn Asp Gly Ser Trp Thr Asp Phe Ala Ser Ala Gly Asp Thr 
385 390 395 400 

Val Thr Phe Arg Gin Val Ala Val Asp Glu Val Val Val Thr Asn Asn 
405 410 415 

Pro Ala Gly Gly Gly Ser Ala Pro Thr Phe Thr Val Arg Val Pro Pro 
420 425 430 

Ser Asn Ala Tyr Thr Asn Thr Val Phe Arg Asn Thr Leu Leu Glu Thr 
435 440 445 

Arg Pro Ser Ser Arg Arg Leu Glu Leu Pro Met Pro Pro Ala Asp Phe 
450 455 460 

Gly Gin Thr Val Ala Asn Asn Pro Lys He Glu Gin Ser Leu Leu Lys 
465 470 475 480 

Glu Thr Leu Gly Cys Tyr Leu Val His Ser Lys Met Arg Asn Pro Val 
485 490 495 

Phe Gin Leu Thr Pro Ala Ser Ser Phe Gly Ala Val Ser Phe Asn Asn 
500 505 510 

Pro Gly Tyr Glu Arg Thr Arg Asp Leu Pro Asp Tyr Thr Gly He Arg 
^515 520 525 

Asp Ser Phe Asp Gin Asn Met Ser Thr Ala Val Ala His Phe Arg Ser 
530 535 540 

Leu Ser His Ser Cys Ser He Val Thr Lys Thr Tyr Gin Gly Trp Glu 
545 550 555 560 

Gly Val Thr Asn Val Asn Thr Pro Phe Gly Gin Phe Ala His Ala Gly 
565 570 575 

Leu Leu Lys Asn Glu Glu He Leu Cys Leu Ala Asp Asp Leu Ala Thr 
530 585 590 
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Arc Leu Thr Gly Val Tyr Pro Ala Thr Asp Asn Phe Ala Ala Ala Val 
^ 595 600 605 

Ser Ala Phe Ala Ala Asn Met Leu Ser Ser Val Leu Lys Ser Glu Ala 
610 615 620 

Thr Ser Ser He He Lys Ser Val Gly Glu Thr Ala Val Gly Ala Ala 
625 630 635 640 

Gin Ser Gly Leu Ala Lys Leu Pro Gly Leu Leu Met Ser Val Pro Gly 
6A5 650 655 

Lys He Ala Ala Arg Val Arg Ala Arg Arg Ala Arg Arg Arg Ala Ala 
^ 660 665 670 

Arg Ala Asn 
675 
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CLAIMS 

1. An isolated small RNA virus C£9>able of infecting insect species 
including Heliothis species. 

5 2. The virus of daim 1 comprising a genome hybridizable with the 
nucleotide sequence of RNA 1 or RNA 2 as herein described. 

3. Hie virus of daim 1 vMdx comprises proteins whidi are capable of 
generating antibodies said antibodies being immunologically reactive with the 

10 large coat protein of HaSV as herein described 

4. The virus of daim 1 herein said virus has a partide size of 
approximately 35 to 38 nm and comprises a genome with RNA of about 53 
and 2.4 kb in length. 

15 

5. Hie virus of daim 4 herein said partide comprises coat proteins of 
approximately 7 and 64 KDa. 

6. The virus of daim 1 herein said virus is HaSV or a mutant, variant or 
20 derivative thereof as herein described. 



7. The virus of daim 6 wherein said virus comprises a nudeic arid 
sequence vMdx is an encs^idation sequence, structure or signal with at least 
50% nudeotide sequence identity to the corresponding nudeotide sequences of 

25 HaSV. 

8. The virus of daim 6 \dierein said virus comprises a nudeic add 
sequence ^wiiich encodes proteins with at least 60% amino add sequence 
identity to the corresponding proteins or polypeptides of HaSV. 

30 

9. The virus of daim 6 \niierein said virus comprises a nudeic add 
sequence ^v^^ch has at least 50% nucleotide sequence identity to the portions 
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of the HaSV genome vMch encode: amino add residues 401 to 600 of the 
replicase enzyme or amino add residues 273 to 435 or 50 to 272 or 436 to the 
CXX>H terminus of the c^id protein. 

5 10. The virus of daim 1, being that isolated in Example 1 or Example 2 
herein or having a genome substandally similar to the virus isolated in 
Example 1 or Example 2. 

11. An isolated nudeic add molecule comprising a nudeic add sequence 
10 hybridizable with RNA 1 or RNA 2 as herein described under low stringency 

conditions. 

12. The molecule of daim 11 n^erein said sequence is hybridizable under 
medium stringency conditions. 

15 

13. The molecule of daim 12 herein said sequence is hybridizable under 
hig^ stringency conditions. 

14. The molecule of daim 11 \«dierein said sequence encodes P7. P16, P17, 
20 P64, P70, P71, Plla, Pllb, P14 or P187 or a mutant, variant or derivative 

thereof as herein described. 

15. The molecule of daim 14 herein said sequence encodes P7, P64, or 
P71 or a mutant, variant or derivative thereof as herein described. 

25 

16. The molecule of daim 11 capable of being used as a probe or primer 
for the nudeic add sequence of RNA 1 or RNA 2, or mutants, variants or 
derivatives thereof, said molecule comprising nudeic acid sequences suitable 
for detection of, or replication of, RNA 1 or RNA 2, or portions thereof under 

30 appropriate conditions. 
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17. The molecule of claim 16 capable of being used as one of a primer pair, 
herein said primes is derived from a sequence of RNA 1 or RNA 2 ^ch is 
located between 300 and 1500 bp from another sequence of RNA 1 or RNA 2 
being the sequence of the other primer of said primer pair. 

5 

18. The molecule of daim 17 comprising the following sequences 

5' GGGGGGAATTCATITAGGTGACACTATAQTrCTGCCTCCCCGGAC 
(called "HvRlSPSp" herein) 

10 

5' GGGGGGATCCTGGTATCCCAGGGGGGC (caUed "HvR13p" herein) 

5' CCGGAAGCTTGTl'n'lUrnUl l lACCA (called "Hr2cdna5" herein) 

15 5' GGGGGATCCGATGGTATCCCGAGGGACGC 
TCAGCAGGTGGCATAGG (called "HvR23p" herein) 

AAATAATTTTY^TTArTTTAnAAnnAnATATArA TATGAGCGAGCGA 
GCACAC (caUed "HVPET65N" herein) 

20 

A A AT A ATTTTnTTTA APrTTA AO A AOO Af? ATrTArA TATrtrTTinAnT 

GGCGTCAC (called "HVPET63N" herein) 

nnAnATrTArA TATnnnArTATf^rTGGAGTG (called "HVPET64N " 
25 herein) 

GTAGCGAACGTCGAGAA (called "HVRNA2F3" herein) 

nnnnnATrcrr CAGTTGTCAGTGGCGGGGTAC- (called •TrvT&5C' 
30 herein) 

GGGGATrr CTAATTGGrACGAGCGGCGC (called "HVP6C2" herein) 
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AATTACATATCGCGGCCXjCCGITTCTGCC (caUed "HVP6MA" herein) 



AATTACATATGTTCXjCGGCXGCCGTTTCT (caUed "HVP6MF" herein). 

5 19. The molecule of daim 11 additionally oomprising a ribozyme sequence. 

20. A vector comprising the molecule of daim 11. 

21. A vector comprising the molecule of daim 14. 

10 

22. A vector comprising the molecule of claim IS. 

23. A vector comprising the molecule of daim 16. 
15 24. A vector comprising the molecule of daim 17. 

25. A vector oomprising die molecule of daim 18. 

26. A vector comprising the molecule of daim 11 enable of replication, 
20 expression and/or encapsidation in an animal, plant or bacterial cell. 

27. A vector comprising the molecule of dahn 11 capable of transferring 
said nudeic add molecule to a plant celL 

25 28. The vector of daim 26 or daim 27 vAndi comprises a ribozyme for 
facilitating replication, e3q>ression or encapsidation of the transcript 

29. The vector of daim 26 or daim 27 wherein having a ribo^qane sequence 
selected from one of the following sequences 
30 5'CCATCGATGCCGGACTGGTATCCCAGGGGG (called "HVRlCla" 
herein) 
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5' CCATCGATGCCGGACTGGTATCCCGAGGGAC (caUed "S'HVRZGa" 
herein) 

5' CCATCGATGATCCAGCCTCCTCGCGGCGCCGGATGGGCA (caUed 
5 "RZHDVl" herein) 

5' GCTCTAGATCCATTCGCCATCCGAAGATGCCCATCCGGC (caUed 
"RZHDV2" herein) 

10 5' CCATCGATTTATGCCXjAGAAGGTAACXiAGAGAAACACAC (caUed 
"RZHCl" herein) 

5' GCTCTAGACCAGGTAATATACXACAACGTGTGTTTCTCT (caUed 
"RZHC2" herein) 

15 

30. The vector of claim 26 or daim 27 wiach comprises a promoter for 
facilitating e^esdon said ixt}moter selected from the groiip of the Drosophila 
promoters, heat shock promoters, baculovirus promoters, CMV promoters. 

20 31. A vector of daim 20 comprising the plasmids pDHVRl, pDHVRlRZ, 
pDHYR2, pDHVR2RZ, pl7V71, pl7E71, pPH, pV71, pl7V64, pl7E64, pP64, 
pV64, pBacHVRl, pBacHVRlRZ, pBacHUR2, pBacHVR2RZ, pHSPRl, 
pHSPRlRZ, pHSPR2, pHSPR2RZ, pSRl(E3)A, pSRl(E3)B, pSR2A, pSR2B, 
pSX2P70, pSXR2P70, pSRP2B, pBHVRlB, pBHVR2B, pT7T2P64, pSR2P70, 

25 pT7T2P65, pT7T2P70, pT7T2-P71,pBSKSE3, pBSR15, pBSR25p, pSR25, 

phr236P70, phr235P65, pGemP63N, pGemP64N, pGemP65N, pP64N, pP65H, 
pTP6MA, pTP6MF, pTP17, pTP17deIBB, pP656 or p70G as described herem. 

32. A host cell comprising the vector of daim 20. 

30 

33. The host cdl of daim 32 -wiierein said ceU is an insect cell or a plant 
cell. 
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34. An isolated protein or polypeptide preparation of the proteins or 
polypeptides derivable from the virus claimed in daim 1. 

35- The preparation of claim 34 vMch comprises P7, P16, P17, P64, P70, 
5 P71, Plla, Pllb, P14 or P187 or mutants, variants or derivatives as described 
herein. 

36. The preparation of daim 34 ^ch comprises the large capsid protein or 
a mutant, derivative or variant thereof. 

10 

37. The preparation of daim 36 vMdh comprises the gut binding domain of 
HaSV as herein described. 

38. Hie preparation of daim 37 v^ch comprises the variable regions of 
15 said gut binding domain. 

39. An isolated antibody reactive with the protein or polypeptide 
preparation of daun 34. 

20 40. An isolated antibody reactive with the protein or polypeptide 
preparation of daim 36. 

41. An isolated antibody reactive with the protein or polypeptide 
preparation of daim 37. 

25 

42. An isolated antibody reactive with the protein or polypeptide 
preparation of daim 38. 

43. Tlie antibody of daim 39 wherein said antibody is a monodonal 
30 antibody. 
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44. The preparation of daim 34 ^v^ch comprises assembled virus capsid 
proteins optiona'^ containing an insectiddally effective agent. 

45. A recombinant insect virus vector comprising the nudeic add molecule 
5 of daim 11. 

46. The virus vector of daim 45 comprising material derived firom 
baculovirus induding NPV and GV, entomopoxvirus, cytoplasmic poljdiedrosis 
virus. 

10 

47. The virus vector of daim 45 herein said vector is capable of infecting 
insect spedes induding Heliothis spedes. 

48. The virus vector of daim 45 comprising one or more nudeic add 
15 sequences \^di encode substances ^niiich are deleterious to insects. 

49. A method of controlling insect attack in a plant comprising 
genetically manipulating said plant so that it is capable of e3q)ressing HaSV or 
mutants, derivatives or variants thereof, or an insectiddally effective portion of 

20 HaSV, mutants, derivatives or variants thereof and optionally other 
insectiddally effective agents such that insects feeding on the plant are 
deleteriously effected, 

50. A transgenic plant resistant to insect attack comprising a genome or 
25 subgenome enable of eaqpressing the molecule of daim 11. 

51. The plant of daim 50 capable of esqpressing nudeic add sequences 
encoding one or more substances that are deleterious to insects. 

30 52. A preparation of HaSV or a mutant, variant or derivative thereof, or an 
insectiddally effective portion of said HaSV, or mutant, variant or derivatives 
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thereof, suitable for implication to plants, said preparation capable of 
imparting an insect protective effect. 

53. The plasmid vectors pT7T2b and pT7T2C as described herein. 

5 

54. A method of identifying HaSV or mutants, variants or derivatives 
thereof using the molecule of daim 11 or the antibodies of claim 39 to detect 
the presence of said HaSV in a sample. 
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Fig. 9 



WESTERN BLOTS OF HaSV CAPSID PROTEIN 
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Fig. 10 



DOT-BLOT DETECTION OF HaSV IN FIELD-COLLECTED 
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HaSV capsoid strategy: 
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HaSV expression in plants: 
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HaSV eapression in plants; 
the one-way vector 
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HaSV expression in planl:s: 
the one-way vector for a toxin 
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